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New EP Application S 1 f ^ 2004 

National Public Health institute et al. 
OurRef.: K 1114EP 

IDENTIFICATION OF SNPs ASSOCIATED WITH HYPERLIPEMIA, 
DYSLIPIDEMIA AND DEFECTIVE CARBOHYDRATE METABOLISM 

The present invention relates to a nucleic acid molecule comprising a chromosomal 
region contributing to or indicative of hyperlipidemias and/or dyslipidemias and/or 
defective carbohydrate metabolism, wherein said nucleic acid molecule is selected 
from the group consisting of: (a) a nudeic acid molecule having or comprising the 
nucleic acid sequence of SEQ ID NO: 1, wherein said nucleic acid sequence has 
one or more mutations having an effect on USF1 function; (b) a nucleic acid 
molecule having or comprising the nucleic acid sequence of SEQ ID NO: 1, wherein 
said nucleic acid sequence is characterized by comprising a guanine or an adenine 
residue in position 3966 in intron 7 of the USF1 sequence; and/or (c) a nucleic acid 
molecule having or comprising the nucleic acid sequence of SEQ ID NO: 1, wherein 
said nucleic acid sequence is characterized by comprising a cytosine or a thymine 
residue in position 5205 in exon 1 1 of the USF1 sequence; wherein said nucleic 
molecule extends, at a maximum, 50000 nucleotides over the 5' and/or 3' end of the 
nucleic acid molecule of SEQ ID NO: 1. The present invention further relates to a 
diagnostic composition comprising a nucleic acid molecule encoding USF1 or a 
fragment thereof, the nucleic acid molecule disclosed herein, the vector, the primer 
or primer pair of the present invention or an antibody specific for USF1. Finally, the 
present invention relates to the use of the nucleic acid molecule of the invention for 
the preparation of a pharmaceutical composition for the treatment of hyperlipidemia, 
dyslipidemia, coronary heart disease, type II diabetes, metabolic syndrome, 
hypertension or atherosclerosis. 

A variety of documents is cited throughout this specification. The disclosure content 
of these documents, including manufacturer's manuals and catalogues, is herewith 
incorporated by reference. 

Familial combined hyperlipidemia (FCHL) is characterized by elevated levels of 
serum total cholesterol (TC), triglycerides (TG), or both 1 ' 2 . Recently, the first major 



locus for FCHL was identified on human chromosome 1q21-q23 in 31 Finnish FCHL 
families 4 . This finding has been replicated in FCHL families from other, more 
heterogeneous populations 5 " 7 . In addition, genome-wide scans have identified 
several other putative loci for FCHL in Finnish and Dutch study samples 8 " 9 . 
Interestingly, the same markers in the 1q21 region have also been linked to type 2 
diabetes mellitus (T2DM) in numerous studies 10 " 14 , including a Finnish study 15 . The 
evidence for linkage obtained for 1q21 has varied in these FCHL and T2DM studies, 
most likely reflecting genetic heterogeneity as well as population-based and 
diagnostic differences. Importantly, however, many of the critical metabolic features 
of FCHL, e.g. hypertriglyceridemia and insulin resistance, also represent trait 
components of T2DM. Interestingly, a rodent locus for combined hyperlipidemia was 
linked to a region on mouse chromosome 3, potentially orthologous with human 
1q21 (ref. 16). The underlying gene, thioredoxin interacting protein (7XA//P), was 
recently identified providing a strong positional candidate for human FCHL 17 . 

As pointed out above, familial combined hyperlipidemia (FCHL) is characterized by 
elevated levels of serum total cholesterol (TC), triglycerides (TG), or both 1,2 . This 
complex disorder is the most common familial hyperlipidemia with a prevalence of 
1% to 2% in Western populations 1 . FCHL constitutes a powerful genetic factor in 
atherosclerosis since it is observed in about 20% of coronary heart disease (CHD) 
patients under 60 years 3 . Despite tremendous efforts to identify the molecular 
mechanisms underlying FCHL, its etiology remains unknown. As a consequence it 
is presently not possible to diagnose or treat patients affected by familial combined 
hyperlipidemia (FCHL). 

In view of the above, the technical problem underlying the present invention was to 
provide means and methods that allow for an accurate and convenient diagnosis of 
of hyperlipidemias and/or dyslipidemias or defective carbohydrate metabolism or of 
a predisposition to these conditions. 

The solution to said technical problem is achieved by the embodiments 
characterized in the claims. 

Thus, the present invention relates to a nucleic acid molecule comprising a 
chromosomal region contributing to or indicative of hyperlipidemias and/or 
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5 dyslipidemias or defective carbohydrate metabolism, wherein said nucleic acid 
molecule is selected from the group consisting of: (a) a nucleic acid molecule having 
or comprising the nucleic acid sequence of SEQ ID NO: 1, wherein said nucleic acid 
sequence has one or more mutations having an effect on USF1 function; (b) a 
nucleic acid molecule having or comprising the nucleic acid sequence of SEQ ID 
10 NO: 1 , wherein said nucleic acid sequence is characterized by comprising a guanine 
or an adenine residue in position 3966 in intron 7 of the USF1 sequence; and/or (c) 
a nucleic acid molecule having or comprising the nucleic acid sequence of SEQ ID 
NO: 1, wherein said nucleic acid sequence is characterized by comprising a 
cytosine or thymine residue in position 5205 in exon 11 of the USF1 sequence; 
©15 wherein said nucleic molecule extends, at a maximum, 50000 nucleotides over the 
5' and/or 3' end of the nucleic acid molecule of SEQ ID NO: 1. In preferred 
embodiments, the nucleic acid molecule extends up to 40000 nucleotides or up to 
25000 nucleotides or up to 5000 nucleotides over the 5' and/or 3' end of the nucleic 
acid molecule of SEQ ID NO: 1. 

20 The term "hyperlipidemias and dyslipidemias" refers to diseases associated with an 
increased levels of serum total cholesterol and/or triglycerides, as well as increased 
levels of low-density lipoprotein (LDL) cholesterol and/or apolipoprotein B and/or 
decreased levels of serum high-density lipoprotein (HDL) cholesterol and/or small 
dense LDL. In accordance with the present invention such diseases include familial 
^25 combined hyperlipemia (FCHL), hypercholesterolemia, hypertriglyceridemia, 
hypoalphalipoproteinemia, hyperapobetalipoproteinemia (hyperapoB), familial 
dyslipidemic hypertension (FDH), hypertension, coronary heart disease and 
atherosclerosis. 

In accordance with the invention, the term "defective carbohydrate metabolism" 
30 refers to glucose intolerance and insulin resistance. Defective carbohydrate 
metabolism might therefore be indicative of diseases such as type 2 diabetes 
mellitus (T2DM) and metabolic syndrome. 

The term "contributing to or indicative of hyperlipidemias and/or dyslipidemias or 
j defective carbohydrate metabolism", refers to the fact that the SNPs and thus the 

35 corresponding nucleic acid molecules found are indicative of the condition and 



possibly also causative therefore. Accordingly, this term necessarily requires that 
the recited position is indicative of the condition. Said term, on the other hand, does 
not necessarily require that the particular position containing the SNP is actually 
causative or contributes to the condition. Yet, said term does not exclude a 
causative or contributory role of either or both SNPs. 

The nucleotide sequence designated SEQ ID NO:1 is a genomic nucleotide 
sequence of 5687 bp, representing USF1 as deposited under databank accession 
number RefSeq: NM_007122 for the human USF1 mRNA with the corresponding 
genomic sequence as deposited under >hg16_refGene_NM_007122 
range=chr1:1 58225833-1 58231 51 9 in the UCSC Genome Browser on Human in 
July 2003. For the purpose of the present invention, the activity or function of the 
polypeptide encoded by this nucleotide sequence is defined as "wild-type USF1 
protein activity". Likewise, SEQ ID NO:1 is understood as representing wild-type 
USF1 if sequence position 3966 is an adenine and sequence position 5205 is a 
thymine. USF1 is known as a transcription factor, capable of binding to the 
recognition sequence CACGTG termed E box and capable of regulating the 
expression of genes such as apolipoproteins CI 1 1 (APOC3), All (APOA2), APOE, 
hormone sensitive lipase (LIPE), fatty acid synthase (FAS), glucokinase (GCK), 
glucagon receptor (GCGR), ATP-binding cassette, subfamily A (ABCA1), renin 
(REN) and angiotensinogen (AGT). Moreover, USF1 is known to interact with other 
factors of the cellular transcription machinery, such as USF2. 

The term "one or more mutations having an effect on USF1 function" refers to 
mutations affecting USF1 function. Throughout the present invention the term 
"function" and "activity" are used exchangeable. Since USF1 is a transcription factor, 
the term "USF1 function" refers to its activity as a transcription factor including its 
specificity to its target recognition sequence on the genomic DNA, its protein 
interaction sequences and its capability of modulating or regulating transcription. It 
is important to note, however, that also mutations outside of the coding region of 
USF1 can have an effect on USF1 function. Such mutations are, for example, 
mutations affecting the amount of USF1 transcribed in a cell (including mutations 
affecting promoter activity) or mutations that have an impact on splicing or 
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intracellular transport of the RNA transcripts. Any of these mutations is also 
comprised by the present invention. 



The term "nucleic acid molecule" refers both to naturally and non-naturally occurring 
nucleic acid molecules. Non-naturally occurring nucleic acid molecules include 
cDNA as well as derivatives such as PNA. 

10 The term "nucleic acid molecule [...] comprising the nucleic acid sequence of SEQ 
ID NO:", as used throughout this specification, refers to nucleic acid molecules that 
are at least 1 nucleotide longer than the nucleic acid molecule specified by the SEQ 
i ID NO. At the same time, these nucleic acid molecules extend, at a maximum, 
50000 nucleotides over the 5' and/or 3' end of the nucleic acid molecule of the 

15 invention specified e.g. by the SEQ ID NO: 1 . 

A number of previous studies in mammalia have tried to identify chromosomal 
regions contributing to or associated with familial combined hyperlipidemia. A rodent 
locus for combined hyperlipidemia was linked to a region on mouse chromosome 3, 
potentially orthologous with human 1q21 (ref. 16). The underlying gene, thioredoxin 
20 interacting protein (TXNIP), was recently identified providing a strong positional 
candidate for human FCHL 17 . Surprisingly, the results disclosed by the present 
invention show that two single-nucleotide polymorphisms located in intron 7 and 
I exon 11, respectively, of human USF1 are associated with hyperlipidemias, 
dyslipidemias and defective carbohydrate metabolism. The disclosed 
25 polymorphisms allow to screen individuals for a presence or predisposition of 
hyperlipidemia and/or dyslipidemia and/or defective carbohydrate metabolism. 

In a preferred embodiment, the nucleic acid molecule of the present invention is 
genomic DNA. This preferred embodiment of the invention reflects the fact that 
usually the analysis would be carried out on the basis of genomic DNA from body 
30 fluid, cells or tissue isolated from the person under investigation. In a further 
preferred embodiment of the nucleic acid molecule of the invention said genomic 
DNA is part of a gene. In accordance with the invention, it is preferred that at least 
intron 7 of the USF1 gene harboring SNP1 in position 3966 and/or exon 11 of the 
USF1 gene harboring SNP2 in position 5205 relative to the USF1 gene is analyzed. 



It is a central aspect of the present invention that a guanine residue in position 3966 
of the USF1 gene indicates the presence of a disease-associated allele, whereas an 
adenine residue in the same position of the USF1 gene is indicative for the healthy 
allele. Likewise, a cytosine residue in position 5205 of the USF1 gene indicates the 
presence of a disease-associated allele, whereas a thymine residue is indicative for 
the healthy allele. 

The present invention also relates to a fragment of the nucleic acid molecule the 
present invention having at least 20 nucleotides wherein said fragment comprises 
nucleotide position 3966 and/or position 5205 of SEQ ID NO:1. The fragment of the 
invention may be of natural as well as of (semi)synthetic origin. Thus, the fragment 
may, for example, be a nucleic acid molecule that has been synthesized according 
to conventional protocols of organic chemistry. Importantly, the nucleic acid 
fragment of the invention comprises nucleotide position 3966 in intron 7 of the USF1 
gene or nucleotide position 5205 in exon 11 of the USF1 gene. In these positions, 
the fragment may have either the wild-type nucleotide or the nucleotide contributing 
to or indicative of hyperlipidemia and/or dyslipidemia and/or defective carbohydrate 
metabolism (also referred to as the "mutant" or "disease-associated" sequence). 
Consequently, the fragment of the invention may be used, for example, in assays 
differentiating between the wild-type and the mutant sequence. 

it is further preferred that the fragment of the invention consists of at least 17 
nucleotides, more preferred at least 20 nucleotides, and most preferred at least 25 
nucleotides such as 30 nucleotides. Preferably, however, the fragment is of up to 
100bp, up to 200bp, up to 300bp, up to 400bp, up to 500bp, up to 600bp, up to 
700bp, up to 800bp, up to 900bp or up to 1000bp in length. 

Furthermore, the invention relates to a nucleic acid molecule which is 
complementary to the nucleic acid molecule of the present invention and which has 
a length of at least 17 or of at least 20 nucleotides. Preferably, however, 
complementary nucleic acid molecule is of up to 100bp, up to 200bp, up to 300bp, 
up to 400bp, up to 500bp, up to 600bp, up to 700bp, up to 800bp, up to 900bp or up 
to 1000bp in length. 
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5 This embodiment of the invention comprising at least 15 or at least 20 nucleotides 
and covering at least position 3966 or position 5205 of the USF1 gene is particularly 
useful in the analysis of the genetic setup in the recited positions in hybridization 
assays. Thus, for example, a 15 mer exactly complementary either to the wild-type 
sequence or to the variants contributing to or indicative of hyperlipidemia and/or 

10 dyslipidemia and/or defective carbohydrate metabolism may be used to differentiate 
between the polymorphic variants. This is because a nucleic acid molecule labeled 
with a detectable label not exactly complementary to the DNA in the analyzed 
sample will not give rise to a detectable signal, if appropriate hybridization and 
washing conditions are chosen. 

) 

15 In this regard, it is important to note that the nucleic acid molecule of the invention, 
the fragment thereof as well as the complementary nucleic acid molecule may be 
detectably labeled. Detectable labels include radioactive labels such as 3 H, or 32 P or 
fluorescent labels. Labeling of nucleic acids is well understood in the art and 
described, for example, in Sambrook et al., "Molecular Cloning, A Laboratory 

20 Manual"; ISBN: 0879695765, CSH Press, Cold Spring Harbor, 2001. 

Hybridisation is preferably performed under stringent or highly stringent conditions. 
"Stringent or highly stringent conditions" of hybridization are well known to or can be 
established by the person skilled in the art according to conventional protocols. 
Appropriate stringent conditions for each sequence may be established on the basis 
^25 of well-known, parameters such as temperature, composition of the nucleic acid 
molecules, salt conditions etc.: see, for example, Sambrook et al., "Molecular 
Cloning, A Laboratory Manual"; ISBN: 0879695765, CSH Press, Cold Spring 
Harbor, 2001 and earlier edition Sambrook et al., "Molecular Cloning, A Laboratory 
Manual"; CSH Press, Cold Spring Harbor, 1989 or Higgins and Hames (eds.), 
30 "Nucleic acid hybridization, a practical approach", IRL Press, Oxford 1985 
(reference 54), see in particular the chapter "Hybridization Strategy" by Britten & 
Davidson, 3 to 15. Typical (highly stringent) conditions comprise hybridization at 
65°C in 0.5xSSC and 0.1% SDS or hybridization at 42°C in 50% formamide, 4xSSC 
and 0.1% SDS. Hybridization is usually followed by washing to remove unspecific 
35 signal. Washing conditions include conditions such as 65°C, 0.2xSSC and 0.1% 
SDS or 2xSSC and 0,1% SDS or 0.3XSSC and 0,1% SDS at 25°C - 65°C. 
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Hybridisation may also be performed under conditions of lower stringency. The 
parameters of such hybridization conditions are described in Sambrook et al., 
"Molecular Cloning, A Laboratory Manual"; ISBN: 0879695765, CSH Press, Cold 
Spring Harbor, 2001 in more detail. A non-limiting, example of low stringency 
hybridization conditions are hybridization in 35% formamide, 5.times. SSC, 50 mM 
Tris-HCI (pH 7.5), 5 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 mg/ml 
denatured salmon sperm DNA, 10% (wt/vol) dextran sulfate at 40.degree. C, 
followed by one or more washes in 2.times. SSC, 25 mM Tris-HCI (pH 7.4), 5 mM 
EDTA, and 0.1% SDS at 50.degree. C. Other conditions of low stringency that may 
be used are well known in the art (e.g., as employed for cross-species 
hybridizations). See, e.g., Ausubel, et al. (eds.), 1993, CURRENT PROTOCOLS IN 
MOLECULAR BIOLOGY, John Wiley & Sons. NY, and Kriegler, 1990, GENE 
TRANSFER AND EXPRESSION, A LABORATORY MANUAL, Stockton Press, NY; 
Shilo and Weinberg, 1981, Proc Natl Acad Sci USA 78: 6789-6792. 

In addition, the invention relates to a vector comprising the nucleic acid molecule as 
described herein above. The vectors may particularly be plasmids, cosmids, viruses 
or bacteriophages used conventionally in genetic engineering that comprise the 
nucleic acid molecule of the invention. Preferably, said vector is an expression 
vector and/or a gene transfer or targeting vector. Expression vectors derived from 
viruses such as retroviruses, vaccinia virus, adeno-associated virus, herpes viruses, 
or bovine papilloma virus, may be used for delivery of the nucleic acid molecule of 
the invention into targeted cell population. Methods which are well known to those 
skilled in the art can be used to construct recombinant viral vectors; see, for 
example, the techniques described in Sambrook et al., loc. cit. and Ausubel et al., 
Current Protocols in Molecular Biology, Green Publishing Associates and Wiley 
Interscience, N.Y. (2001). Alternatively, the nucleic acid molecules and vectors of 
the invention can be reconstituted into liposomes for delivery to target cells. The 
vectors containing the nucleic acid molecules of the invention can be transferred 
into the host cell by well-known methods, which vary depending on the type of 
cellular host. For example, calcium chloride transfection is commonly utilized for 
prokaryotic cells, whereas, e.g., calcium phosphate or DEAE-Dextran mediated 



transfection or electroporation may be used for other cellular hosts; see Sambrook, 
supra. 

Such vectors may comprise further genes such as marker genes which allow for the 
selection of said vector in a suitable host cell and under suitable conditions. 
Preferably, the nucleic acid molecule of the invention is operatively linked to 
expression control sequences allowing expression in prokaryotic or eukaryotic cells. 
Expression of said polynucleotide comprises transcription of the polynucleotide into 
a translatable mRNA. Regulatory elements ensuring expression in eukaryotic cells, 
preferably mammalian cells, are well known to those skilled in the art. They usually 
comprise regulatory sequences ensuring initiation of transcription and, optionally, a 
poly-A signal ensuring termination of transcription and stabilization of the transcript, 
and/or an intron further enhancing expression of said polynucleotide. Additional 
regulatory elements may include transcriptional as well as translational enhancers, 
and/or naturally-associated or heterologous promoter regions. Possible regulatory 
elements permitting expression in prokaryotic host cells comprise, e.g., the PL, lac, 
trp or tac promoter in E. coli, and examples for regulatory elements permitting 
expression in eukaryotic host cells are the AOX1 or GAL1 promoter in yeast or the 
CMV-, SV40- , RSV-promoter (Rous sarcoma virus), CMV-enhancer, SV40- 
enhancer or a globin intron in mammalian and other animal cells. Beside elements 
which are responsible for the initiation of transcription such regulatory elements may 
also comprise transcription termination signals, such as the SV40-poly-A site or the 
tk-poly-A site, downstream of the polynucleotide. Optionally, the heterologous 
sequence can encode a fusion protein including an C- or N-terminat identification 
peptide imparting desired characteristics, e.g., stabilization or simplified purification 
of expressed recombinant product In this context, suitable expression vectors are 
known in the art such as Okayama-Berg cDNA expression vector pcDV1 
(Pharmacia), pCDM8, pRc/CMV, pcDNAI, pcDNA3, the Echo™ Cloning System 
(Invitrogen), pSPORTI (GIBCO BRL) or pRevTet-On/pRevTet-Off or pCI 
(Promega). 

Preferably, the expression control sequences will be eukaryotic promoter systems in 
vectors capable of transforming or transfecting eukaryotic host cells, but control 
sequences for prokaryotic hosts may also be used. 
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As mentioned above, the vector of the present invention may also be a gene 
transfer or targeting vector. Gene therapy, which is based on introducing therapeutic 
genes into cells by ex-vivo or in-vivo techniques is one of the most important 
applications of gene transfer. Suitable vectors and methods for in-vitro or in-vivo 
gene therapy are described in the literature and are known to the person skilled in 
the art; see, e.g., Giordano, Nature Medicine 2 (1996), 534-539; Schaper, Circ. Res. 
79 (1996), 911-919; Anderson, Science 256 (1992), 808-813; Isner, Lancet 348 
(1996), 370-374; Muhlhauser, Circ. Res. 77 (1995), 1077-1086; Wang, Nature 
Medicine 2 (1996), 714-716; W094/29469; WO 97/00957, Schaper, Current Opinion 
in Biotechnology 7 (1996), 635-640, or Kay et al. (2001) Nature Medicine, 7, 33-40) 
and references cited therein. The polynucleotides and vectors of the invention may 
be designed for direct introduction or for introduction via liposomes, or viral vectors 
(e.g. adenoviral, retroviral) into the cell. Preferably, said cell is a germ line cell, 
embryonic cell, or egg cell or derived therefrom, most preferably said cell is a stem 
cell. Gene therapy is envisaged with the wild-type nucleic acid molecule only. 

The invention also relates to a primer or primer pair, wherein the primer or primer 
pair hybridizes under stringent conditions to the nucleic acid molecule of the present 
invention comprising nucleotide positions 3966 and/or 5205 SEQ ID NO:1 or to the 
complementary strand thereof. In a preferred embodiment, said primer has an 
adenine or a guanine residue in the position corresponding to position 3966 of the 
USF1 sequence. In another preferred embodiment, said primer has a cytosine or a 
thymine residue in the position corresponding to position 5205 of the USF1 
sequence. The primer may bind to the coding (+) strand or to the non-coding (-) 
strand of the DNA double strand. 

Preferably, the primers of the invention have a length of at least 14 nucleotides such 
as 17, 20 or 21 nucleotides. The fact that in one embodiment the target sequence of 
the primer is located 3' to the SNP is to ensure that the primer is actually useful for 
sequence analysis, i.e. that the elongated primer sequence actually contains the 
SNP. When a PCR reaction is performed, for example, usually two primers are 
involved, wherein one primer binds 3' of the SNP on the + strand and the other 
primer binds 3' of the SNP on the - strand. 



11 

In one embodiment, the primer actually binds to the position of the SNP. As a 
consequence, when binding is performed under stringent conditions, such a primer 
is useful to distinguish between different polymorphic variants as binding only 
occurs if the sequences of the primer and the target have full complementarity. It is 
further preferred that the primers have a maximum length of 24 nucleotides. 
However, in particular cases it may be preferable to use primers with a maximum 
length of 30 of 35 nucleotides. Hybridization or lack of hybridization of a primer 
under appropriate conditions to a genome sequence comprising either position 3966 
or position 5205 coupled with an appropriate detection method such as an 
elongation reaction or an amplification reaction may be used to differentiate 
between the polymorphic variants and then draw conclusions with regard to, e.g., 
the predisposition of the person under investigation hyperlipidemia and/or 
dyslipidemia and/or defective carbohydrate metabolism. The present invention 
envisages two types of primers/primer pairs. One type hybridizes to a sequence 
comprising the mutant, i.e. disease-associated sequence. In other terms. One 
nucleotide of the primer pairs with the guanine residue in position 3966 (or the 
cytosine residue of the complementary strand) or with the thymine residue in 
position 5205 (or the adenine residue in the complementary strand). The other type 
of primer is exactly complementary to a sequence of wild-type. Since hybridization 
conditions would preferably be chosen to be stringent enough, contacting of e.g. a 
primer exactly complementary to the mutant sequence with a wild-type allele would 
not result in efficient hybridization due to the mismatch formation. After washing, no 
signal would be detected due to the removal of the primer. 

Additionally, the invention relates to a non-human host transformed with the vector 
of the invention as described herein above. The host may either carry the mutant or 
the wild-type sequence. Upon breeding etc. the host may be heterozygous or 
homozygous for one or both SNPs. 

The host of the invention may carry the vector of the invention either transiently or 
stably integrated into the genome. Methods for generating the non-human host of 
the invention are well known in the art. For example, conventional transfection 
protocols described in Sambrook et al., loc. cit, may be employed to generate 
transformed bacteria (such as E. coli) or transformed yeasts. The non-human host 
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of the invention may be used, for example, to elucidate the onset of hyperlipidemia 
and/or dyslipidemia and/or defective carbohydrate metabolism. 

In a preferred embodiment of the invention the non-human host is a bacterium, a 
yeast cell, an insect cell, a fungal cell, a mammalian cell, a plant cell, a transgenic 
animal or a transgenic plant. 

Whereas E. coli is a preferred bacterium, preferred yeast cells are S. cerevisiae or 
Pichia pastoris cells. Preferred fungal cells are Aspergillus cells and preferred insect 
cells include Spodoptera frugiperda cells. Preferred mammalian cells are CHO cells, 
colon carcinoma and hepatoma cell lines showing expression of the USF1 
transcription factor. However, also cell lines with very low expression of USF1, 
including HeLa cells and the like or fibroblasts, might be particularly useful for 
specific experiments. 

A method for the production of a transgenic non-human animal, for example 
transgenic mouse, comprises introduction of the aforementioned polynucleotide or 
targeting vector into a germ cell, an embryonic cell, stem cell or an egg or a cell 
derived therefrom. The non-human animal can be used in accordance with a 
screening method of the invention described herein. Production of transgenic 
embryos and screening of those can be performed, e.g., as described by A. L. 
Joyner Ed., Gene Targeting, A Practical Approach (1993), Oxford University Press. 
The DNA of the embryonal membranes of embryos can be analyzed using, e.g., 
Southern blots with an appropriate complementary nucleic acid molecule; see 
supra. A general method for making transgenic non-human animals is described in 
the art, see for example WO 94/24274. For making transgenic non-human 
organisms (which include homologously targeted non-human animals), embryonal 
stem cells (ES cells) are preferred. Murine ES cells, such as AB-1 line grown on 
mitotically inactive SNL76/7 cell feeder layers (McMahon and Bradley, Cell 62:1073- 
1085 (1990)) essentially as described (Robertson, E. J. (1987) in Teratocarcinomas 
and Embryonic Stem Cells: A Practical Approach. E. J. Robertson, ed. (Oxford: IRL 
Press), p. 71-112) may be used for homologous gene targeting. Other suitable ES 
lines include, but are not limited to, the E14 line (Hooper et al., Nature 326:292-295 
(1987)), the D3 line (Doetschman et al., J. Embryol. Exp. Morph. 87:27-45 (1985)), 
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the CCE line (Robertson et al., Nature 323:445-448 (1986)), the AK-7 line (Zhuang 
et al., Cell 77:875-884 (1994)). The success of generating a mouse line from ES 
cells bearing a specific targeted mutation depends on the pluripotence of the ES 
cells (i. e., their ability, once injected into a host developing embryo, such as a 
blastocyst or morula, to participate in embryogenesis and contribute to the germ 
cells of the resulting animal). The blastocysts containing the injected ES cells are 
allowed to develop in the uteri of pseudopregnant nonhuman females and are born 
as chimeric mice. The resultant transgenic mice are chimeric for cells having the 
desired nucleic acid molecule are backcrossed and screened for the presence of the 
correctly targeted transgene (s) by PCR or Southern blot analysis on tail biopsy 
DNA of offspring so as to identify transgenic mice heterozygous for the nucleic acid 
molecule of the invention. 

The transgenic non-human animals may, for example, be transgenic mice, rats, 
hamsters, dogs, monkeys (apes), rabbits, pigs, or cows. Preferably, said transgenic 
non-human animal is a mouse. The transgenic animals of the invention are, inter 
alia, useful to study the phenotypic expression/outcome of the nucleic acids and 
vectors of the present invention. Furthermore, the transgenic animals of the present 
invention are useful to study the developmental expression of the USF1 gene and of 
its role for onset of hyperlipidemia and/or dyslipidemia and/or defective 
carbohydrate metabolism, for example in the rodent intestine. It is furthermore 
envisaged, that the non-human transgenic animals of the invention can be 
employed to test for therapeutic agents/compositions or other possible therapies 
which are useful to hyperlipidemia and/or dyslipidemia and/or defective 
carbohydrate metabolism. 

The present invention also relates to a pharmaceutical composition comprising 
USF1 or a fragment thereof, a nucleic acid molecule encoding USF1 or a fragment 
thereof, or an antibody specific for USF1. 

The components of the pharmaceutical composition of the invention may be 
combined with a pharmaceutical^ acceptable carrier and/or diluent and/or excipient. 
Preferably, USF1 refers to any USF1 being capable of alleviating the disease 
symptoms. Generally, USF1 will be of wild-type. However, in particular cases it 
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might also be useful to administer mutated USF1 having one or more point 
mutations, insertions, deletions and the like and showing increased or decreased 
function or activity. Also encompassed by the present invention are chemically 
modified molecules which improve uptake or stability of a polypeptide. 

Examples of suitable pharmaceutical carriers are well known in the art and include 
phosphate buffered saline solutions, water, emulsions, such as oil/water emulsions, 
various types of wetting agents, sterile solutions etc. Compositions comprising such 
carriers can be formulated by well known conventional methods. These 
pharmaceutical compositions can be administered to the subject at a suitable dose. 
Administration of the suitable compositions may be effected by different ways, e.g., 
by intravenous, intraperitoneal, subcutaneous, intramuscular, topical, intradermal, 
intranasal or intrabronchial administration. The dosage regimen will be determined 
by the attending physician and clinical factors. As is well known in the medical arts, 
dosages for any one patient depends upon many factors, including the patient's 
size, body surface area, age, the particular compound to be administered, sex, time 
and route of administration, general health, and other drugs being administered 
concurrently. A typical dose can be, for example, in the range of 0.001 to 1000 pg of 
nucleic acid for expression or for inhibition of expression; however, doses below or 
above this exemplary range are envisioned, especially considering the 
aforementioned factors. Dosages will vary but a preferred dosage for intravenous 
administration of DNA is from approximately 10 6 to 10 12 copies of the DNA 
molecule. Progress can be monitored by periodic assessment. The compositions of 
the invention may be administered locally or systemically. Administration will 
generally be parenterally, e.g., intravenously; DNA may also be administered 
directly to the target site, e.g., by biolistic delivery to an internal or external target 
site or by catheter to a site in an artery. Preparations for parenteral administration 
include sterile aqueous or non-aqueous solutions, suspensions, and emulsions. 
Examples of non-aqueous solvents are propylene glycol, polyethylene glycol, 
vegetable oils such as olive oil, and injectable organic esters such as ethyl oleate. 
Aqueous carriers include water, alcoholic/aqueous solutions, emulsions or 
suspensions, including saline and buffered media. Parenteral vehicles include 
sodium chloride solution, Ringer's dextrose, dextrose and sodium chloride, lactated 
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Ringer's, or fixed oils. Intravenous vehicles include fluid and nutrient replenishes, 
electrolyte replenishers (such as those based on Ringer's dextrose), and the like. 
Preservatives and other additives may also be present such as, for example, 
antimicrobials, anti-oxidants, chelating agents, and inert gases and the like. 

Additionally, the invention relates to a diagnostic composition comprising a nucleic 
acid molecule encoding USF1 or a fragment thereof, the nucleic acid molecule as 
described herein above, the vector as described herein above, the primer or primer 
pair as described herein above or an antibody specific for USF1 . 

The diagnostic composition is useful for assessing the genetic status of a person 
with respect to his or her predisposition to develop hyperlipidemia and/or 
dyslipidemia and/or defective carbohydrate metabolism or with regard to the 
diagnosis of the acute condition. The various possible components of the diagnostic 
composition may be packaged in one or more vials, in a solvent or otherwise such 
as in lyophilized form. If dissolved in a solvent, the diagnostic composition is 
preferably cooled to at least +8°C to +4°C. Freezing may be preferred in other 
instances. 

The present invention also relates to a method for testing for the presence or 
predisposition of hyperlipidemia and/or dyslipidemia and/or defective carbohydrate 
metabolism, comprising analyzing a sample obtained from a prospective patient or 
from a person suspected of carrying such a predisposition for the presence of a 
wild-type or variant allele of the USF1 gene. Preferably, said variant comprises an 
SNP at position 3966 and/or at position 5205 of the USF1 gene in a homozygous or 
heterozygous state. In varying embodiments, it may be tested either for the 
presence of the wild-type sequence(s) or of the mutant sequence(s). it is in 
accordance with the present invention that a guanine residue in position 3966 of the 
USF1 gene indicates the presence of a disease-associated allele, whereas an 
adenine residue in the same position of the USF1 gene is indicative for the healthy 
allele. Likewise, a cytosine residue in position 5205 of the USF1 gene indicates the 
presence of a disease-associated allele, whereas a thymine residue is indicative for 
the healthy allele. 
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The method of the invention is useful for detecting the genetic set-up of said 
person/patient and drawing appropriate conclusions whether a condition from which 
said patient suffers is hyperlipidemia and/or dyslipidemia and/or defective 
carbohydrate metabolism. Alternatively, it may be assessed whether a person not 
suffering from a condition carries a predisposition to hyperlipidemia and/or 
dyslipidemia and/or defective carbohydrate metabolism. With regard to position 
5205 in exon 1 1 of the USF1 gene, only if cytosine is found in a homozygous or 
heterozygous state, a condition would be diagnosed as hyperlipidemia and/or 
dyslipidemia and/or defective carbohydrate metabolism or a corresponding 
predisposition would be manifest. On the other hand, if thymine is found in a 
homozygous state, then it may be concluded that a condition from which a patient 
suffers is not related to hyperlipidemia or dyslipidemia and/or defective carbohydrate 
metabolism and further, that the patient does not carry a predisposition to develop 
this condition. The situation is similar and essentially the same conclusions apply for 
the analysis of the SNP in position 3966: With regard to position 3966 in intron 7 of 
the USF1 gene, only if guanine is found in a homozygous or heterozygous state, a 
condition would be diagnosed as hyperlipidemia and/or dyslipidemia and/or 
defective carbohydrate metabolism or a corresponding predisposition would be 
manifest. On the other hand, if an adenine is found in a homozygous state, then it 
may be concluded that a condition from which a patient suffers is not related to 
hyperlipidemia or dyslipidemia and/or defective carbohydrate metabolism and 
further, that the patient does not carry a predisposition to develop this condition. 

In a preferred embodiment of the method of the invention said testing comprises 
hybridizing the complementary nucleic acid molecule as described herein above 
which is complementary to the nucleic acid molecule contributing to or indicative of 
hyperlipidemia and/or dyslipidemia and/or defective carbohydrate metabolism or the 
nucleic acid molecule as described herein above which is complementary to the 
wild-type sequence as a probe under (highly) stringent conditions to nucleic acid 
molecules comprised in said sample and detecting said hybridization, wherein said 
complementary nucleic acid molecule comprises the sequence position containing 
the SNP. 
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Again, depending on the nucleic acid probe used, either wild-type or mutant 
sequences (i.e. sequences contributing to or indicative of hyperlipidemia and/or 
dyslipidemia and/or defective carbohydrate metabolism) would be detected. It is 
understood that hybridization conditions would be chosen such that a nucleic acid 
molecule complementary to wild-type sequences would not or essentially not 
hybridize to the mutant sequence. Similarly, a nucleic acid molecule complementary 
to the mutant sequence would not or would not essentially not hybridize to the wild- 
type sequence. In order to differentiate between results obtained from homozygous 
and heterozygous genotypes in the hybridization methods of the invention, one can 
for example monitor/detect the strength/intensity of the respective detection signal 
after the hybridization. To differentiate between wild-type homozygous, 
heterozygous and/or mutant homozygous alleles in the hybridization methods of the 
invention, internal control samples of the corresponding genotypes will be included 
in the analysis. 

In a further preferred embodiment, the method of the invention further comprises 
digesting the product of said hybridization with a restriction endonuclease or 
subjecting the product of said hybridization to digestion with a restriction 
endonuclease and analyzing the product of said digestion. 

This preferred embodiment of the invention allows by convenient means, the 
differentiation between an effective hybridization and a non-effective hybridization. 
For example, if the DNA sequence adjacent to position 3966 or position 5205 
comprises an endonuclease restriction site, the hybridized product will be cleavable 
by an appropriate restriction enzyme upon an effective hybridization whereas a lack 
of hybridization will yield no double-stranded product or will not comprise the 
recognizable restriction site and, accordingly, will not be cleaved. Suitable restriction 
enzymes may be found, for example, by the use of the program Webcutter. The 
analysis of the digestion product can be effected by conventional means, such as by 
gel electrophoresis which may be optionally combined by the staining of the nucleic 
acid with, for example, ethidium bromide. Combinations with further techniques such 
as Southern blotting are also envisaged. 



18 

Detection of said hybridization may be effected, for example, by an anti-DNA 
double-strand antibody or by employing a labeled oligonucleotide. Conveniently, the 
method of the invention is employed together with blotting techniques such as 
Southern or Northern blotting and related techniques. Labeling may be effected, for 
example, by standard protocols and includes labeling with radioactive markers, 
fluorescent, phosphorescent, chemiluminescent, enzymatic labels, etc. The label 
can be located at the 5' and/or 3' end of the nucleic acid molecule or be located at 
an internal position. Preferred labels include, but are not limited to, fluorochromes, 
e.g. Carboxyfluorescein (FAM) and 6-carboxy-X-rhodamine (ROX), fluorescein 
isothiocyanate (FITC), rhodamine, Texas Red, phycoerythrin, allophycocyanin, 6- 
carboxyfiuorescein (6-FAM), 2',7 , -dimethoxy-4 , ,5'-dichloro-6-carboxyfluorescein 
(JOE), 6-carboxy-2\4',7',4,7-hexachlorofluorescein (HEX), 5-carboxyfluorescein (5- 
FAM) or N.N.N'.N'-tetramethyl-e-carboxyrhodamine (TAMRA), radioactive labels, 
e.g. 32 P, 35 S, 3 H; etc. The label may also be a two stage system, where the probe is 
conjugated to biotin, haptens, etc. having a high affinity binding partner, e.g. avidin, 
specific antibodies, etc., where the binding partner is conjugated to a detectable 
label. 

In accordance with the above, in another preferred embodiment of the method of the 
invention said probe is detectably labeled, e.g. by the methods and with the labels 
described herein above. 

In yet another preferred embodiment of the method of the invention said testing 
comprises determining the nucleic acid sequence of at least a portion of the nucleic 
acid molecule as described herein above, said portion comprising the position of the 
SNP. Determination of the nucleic acid molecule may be effected in accordance 
with one of the conventional protocols such as the Sanger or Maxam/Gilbert 
protocols (see Sambrook et al., loc. cit., for further guidance). 

In a further preferred embodiment of the method of the invention the determination 
of the nucleic acid sequence is effected by solid-phase minisequencing. Solid-phase 
minisequencing is based on quantitative analysis of the wild type and mutant 
nucleotide in a solution. First, the genomic region containing the mutation is 
amplified by PCR with one biotinylated and non-biotinylated primer where the 
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biotinylated primer is attached to a streptavidin (SA) coated plate. The PCR-product 
is denatured to a single stranded form to allow a minisequencing primer to bind to 
this strand just before the site of the mutation. The tritium (H3) or fluorescence 
labeled mutated and wild type nucleotides together with nonlabeled dNTPs are 
added to the minisequencing reaction and sequenced using Taq-polymerase. The 
result is based on the amount of wild type and mutant nucleotides in the reaction 
measured by beta counter or fluorometer and expressed as an R-ratio. See also 
Syvanen AC, Sajantila A, Lukka M. Am J Hum Genet 1993: 52,46-59 and 
Suomalainen A and Syvanen AC. Methods Mol Biol 1996;65:73-79. 

A preferred embodiment of the method of the invention further comprises, prior to 
determining said nucleic acid sequence, amplification of at least said portion of said 
nucleic acid molecule. Preferably, amplification is effected by polymerase chain 
reaction (PCR). Other amplification methods such as ligase chain reaction may also 
be employed. 

In a preferred embodiment of the method of the invention said testing comprises 
carrying out an amplification reaction wherein at least one of the primers employed 
in said amplification reaction is the primer as described herein above or belongs to 
the primer pair as described herein above, comprising assaying for an amplification 
product. In this embodiment and depending on the information the 
investigator/physician wishes to obtain, primers hybridizing either to the wild-type or 
mutant sequences may be employed. In a particularly preferred embodiment, at 
least one of the primers will actually bind to the position of the SNP. As a 
consequence, when binding is performed under stringent conditions, such a primer 
is useful to distinguish between different polymorphic variants as binding only 
occurs if the sequences of the primer and the target have full complementarity. 

The method of the invention will result in an amplification of only the target 
sequence, if said target sequence carries a sequence exactly complementary to the 
primer used for hybridization. This is because the oligonucleotide primer will under 
preferably (highly) stringent hybridization conditions not hybridize to the wild- 
type/mutant sequence - depending which type of primer is used - (with the 
consequence that no amplification product is obtained) but only to the exactly 
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matching sequence. Naturally, combinations of primer pairs hybridizing to both 
SNPs may be used. In this case, the analysis of the amplification products expected 
(which may be no, one, two, three or four amplification product(s) if the second, non- 
differentiating primer is the same for each locus) will provide information on the 
genetic status of both positions 3966 and 5205. 

In a preferred embodiment of the method of the invention said amplification is 
effected by or said amplification is the polymerase chain reaction (PCR). The PCR 
is well established in the art. Typical conditions to be used in accordance with the 
present invention include for example a total of 35 cycles in a total of 50ul volume 
exemplified with a denaturation step at 93° C for 3 minutes; an annealing step at 55° 
C for 30 seconds; an extension step at 72° C for 75 seconds and a final extension 
step at 72° C for 10 minutes. 

The present invention further relates to a method for testing for the presence or 
predisposition of hyperlipidemia and/or dyslipidemia and/or defective carbohydrate 
metabolism comprising assaying a sample obtained from a human for the amount of 
USF1 contained in said sample. The amount of USF1 can be determined by any 
suitable method. Preferably, the amount of USF1 is determined by contacting the 
sample, i.e. USF1 contained in the sample, with an antibody or aptamer or a 
derivative thereof, which is specific for USF1. For example, the sample containing 
USF1 may be analyzed in a Western blot or in a RIA assay. In this context a weaker 
staining for the presence of the antigen of the invention compared to homozygous 
wild type control samples (comprising two persistent alleles) is indicative for the 
heterozygous wild type (one persistent allele and one disease-associated allele), 
whereas for the homozygous disease state no staining or a reduced staining is 
expected if the appropriate antibody is used. Preferably, the method of the invention 
is performed in the presence of control samples corresponding to all three possible 
allelic combinations as internal controls. Testing may be carried out with an antibody 
or aptamer etc. specific for the wild-type or specific for the mutant sequence. 
Testing for binding may, again, involve the employment of standard techniques such 
as ELISAs; see, for example, Harlow and Lane 53 , loc. cit. The term "antibody" as 
used throughout the invention refers to monoclonal antibodies, polyclonal 
antibodies, single chain antibodies, or a fragment thereof. Preferably the antibody is 
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specific for USF1 or for wild-type or disease-associated USF1 . The antibodies may 
be bispecific antibodies, humanized antibodies, synthetic antibodies, antibody 
fragments, such as Fab, a F(ab 2 )\ Fv or scFv fragments etc., or a chemically 
modified derivative of any of these (all comprised by the term "antibody"). 
Monoclonal antibodies can be prepared, for example, by the techniques as originally 
described in Kohler and Milstein, Nature 256 (1975), 495, and Galfre, Meth. 
Enzymol. 73 (1981), 3, which comprise the fusion of mouse myeloma cells to spleen 
cells derived from immunized mammals with modifications developed by the art. 
Antibodies may be labelled by using any of the labels described in the present 
invention. 

In a preferred embodiment of the method of the invention said antibody or aptamer 
is detectably labeled. Whereas the aptamers are preferably radioactively labeled 
with 3 H or 32 P or with a fluorescent marker, the antibody may either be labeled in a 
corresponding manner (with 131 1 as the preferred radioactive label) or be labeled 
with a tag such as His-tag, FLAG-tag or myc-tag. 

In a further preferred embodiment of the method of the invention the test is an 
immuno-assay. 

In another preferred embodiment of the method of the invention said sample is 
blood, serum, plasma, fetal tissue, saliva, urine, mucosal tissue, mucus, vaginal 
tissue, fetal tissue obtained from the vagina, skin, hair, hair follicle or another human 
tissue. 

In an additional preferred embodiment of the method of the invention said nucleic 
acid molecule from said sample is fixed to a solid support. 

Fixation of the nucleic acid molecule to a solid support will allow an easy handling of 
the test assay and furthermore, at least some solid supports such as chips, silica 
wafers or microtiter plates allow for the simultaneous analysis of larger numbers of 
samples. Ideally, the solid support allows for an automated testing employing, for 
example, roboting devices. 
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In a particularly preferred embodiment of the method of the invention said solid 
support is a chip, a silica wafer, a bead or a microtiter plate. 

The methods of the present invention may be performed ex vivo, in vitro or in vivo. 

The present invention also relates to the use of a nucleic acid molecule encoding 
USF1, the nucleic acid molecule as described herein above, or of USF1 polypeptide 
for the analysis of the presence or predisposition of hyperlipidemia, dyslipidemia 
and/or defective carbohydrate metabolism. The nucleic acid molecule 
simultaneously allows for the analysis of the absence of the condition or the 
predisposition to the condition, as has been described in detail herein above. In 
particular cases, it may be possible to use USF1 polypeptides for testing. This may 
be, for example, in cases when expression of USF1 results in an autoimmune 
response against USF1. In such cases it will be possible, by using USF1 
polypeptides, to monitor patients by detecting antibodies directed against USF1. 
Such assays can, for example, be based on the western blotting technique or by 
performing (radio)immunoprecipitations. 

In addition, the present invention relates to the use of USF1 or a fragment thereof, a 
nucleic acid molecule encoding USF1 and/or comprising at least the wild-type 
sequence of intron 7 and/or exon 11 of USF1, for the preparation of a 
pharmaceutical composition for the treatment of hyperlipidemias and/or 
dyslipidemias, including familial combined hyperlipidemia (FCHL), 
hypercholesterolemia, hypertriglyceridemia, hypoalphalipoproteinemia, 

hyperapobetalipoproteinemia (hyperapoB) and/or familial dyslipidemic hypertension 
(FDH), coronary heart disease, type II diabetes, atherosclerosis or metabolic 
syndrome. Any of the diseases mentioned in the present invention can be treated by 
administering to a patient USF1 in an amount and quality sufficient to ameliorate the 
symptoms of the disease. If for example the disease symptoms are created by a 
reduced amount of USF1 in the patient, administration of USF1 to the patient will 
compensate for the reduced USF1 of the patient. USF1 may be provided to the 
patient as such, i.e. as the polypeptide. Alternatively, a nucleic acid molecule 
encoding USF1 can be administered. Preferably, USF1 is a full length wild-type 
polyprotein. However, in particular cases it might also be useful to administer 
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5 mutated USF1 having one or more point mutations, insertions, deletions and the like 
and showing increased or decreased function or activity. Also encompassed by the 
present invention are chemically modified molecules which improve uptake or 
stability of a polypeptide. Gene therapy approaches have been discussed herein 
above in connection with the vector of the invention and equally apply here. It is of 
10 note that in accordance with this invention, also fragments of the nucleic acid 
molecules as defined herein above may be employed in gene therapy approaches. 
Said fragments comprise the nucleotide at position 3966 as or position 5205 of the 
USF1 gene. Preferably, said fragments comprise at least 200, at least 250, at least 
300, at least 400 and most preferably at least 500 nucleotides. In a preferred 
(^15 embodiment of the use of the invention said gene therapy treats or prevents 
hyperlipidemia and/or dyslipidemia and/or defective carbohydrate metabolism. 

Finally, the present invention relates to a kit comprising the nucleic acid molecule, 
the primer or primer pair and/or the vector of the present invention in one or more 
containers. 
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The figures show: 



Figure 1: Schematic overview of the associated region on 1q21. Genes for 
which we genotyped SNPs as well as the locations of the peak linkage 
markers D1S104 and D1S1677 (Pajukanta et al. 1998) are shown in 
the uppermost part. The genes indicated in bold were also sequenced. 
Next part shows the SNPs genotyped for JAM1 and USF1 (see Table 
2 for distances, rs numbers and LD clusters of these SNPs). The 
second to lowest part indicates the SNPs associated with TGs in men, 
and the lowest part the SNPs associated with FCHL and TGs in all 
family members. 

Figure 2: Distribution of genes according to functional category for the 16 up- 
regulated and 60 down-regulated genes for which annotation 
information for the gene ontology (GO) class Biological process was 
available. Only categories scoring a statistically significant EASE-score 
(<0.05) for over-representation are shown. Complete results of the 
EASE analysis including the corresponding EASE scores (p-values) 
and the lists of genes in every significant category are given in the 
Supplementary Table 3a-b. 

Figure 3a: Intron 7 of USF1 harbors the 60-bp sequence shared by the 91 USF1- 
similarity genes. Parts (2-61 bp and 137-196 bp) of the AluSx repeat in 
intron 7 of USF1 have sequence similarities with the mouse B1 repeat. 
A total of 91 human genes, including USF1, have this 60-bp part of 
AluSx located either on the coding strand (43 genes) or on the 
opposite strand (48 genes). These 91 genes are listed in the 
Supplementary Table 4. 



Figure 3b: 



Transcription efficiency of a 268-bp region in intron 7 of USF1 
containing the critical 60-bp sequence and the usf1s2 SNP (see Fig. 
3a). DNAs from one homozygous susceptibility carrier (haplotype 1-1) 
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and one homozygous non-carrier (2-2) were cloned to the SEAP 
reporter system in both forward and reverse orientations. HC for and 
HC rev indicate constructs of a haplotype carrier (1-1) DNA in forward 
and reverse orientations; HNC for and HNC rev indicate constructs of 
a haplotype non-carrier (2-2) DNA in forward and reverse orientations. 
Culture media from cells transfected with the pSEAP2-Basic vector 
was used as a negative control (Neg) and culture media from cells 
transfected with the pSEAP2-Control vector as a positive control (Pos), 
respectively. The monitoring of the SEAP protein was performed 48 
and 72 hours post-transfection. Error bars represent SD of one 
experiment done in triplicate. The size of the bar indicates the increase 
in transcriptional activity when compared to the negative control which 
is set to 1 . 
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5 The examples illustrate the invention. 



EXAMPLE 1: EXPERIMENTAL OUTLINE 

All analyzed FCHL families had a proband with severe CHD and lipid phenotype, 
and on average 5-6 FCHL affected family members. These FCHL families exhibiting 

10 extreme and well-defined disease phenotypes were analyzed to identify the 
underlying gene contributing to FCHL on 1q21. We selected a regional candidate 
gene approach and sequenced four functionally relevant regional candidate genes 
on 1q21. The TXNIP, USF1, retinoid X receptor gamma (RGRG), and apolipoprotein 
A2 (APOA2) genes were sequenced to identify ail possible variants. Of these, 

15 TXNIP initially represented the most promising positional candidate gene, because it 
has been shown to underlie the combined hyperlipidemia phenotype in mice 17 . The 
three additional regional genes were selected for sequencing based on their 
functional candidacy and close location (< 2.5 Mb) to the original peak linkage 
markers, D1S104 and D1S1677 (Fig.1). In parallel, we employed a functionally 

20 unbiased, genetic approach, where an initial set of SNPs for genes around the peak 
linkage markers were tested for association. A total of 60 SNPs were genotyped for 
26 genes on 1q21. Fifty of these SNPs were located within 5.8 Mb, flanking D1S104 
and D1S1677. All 60 SNPs were genotyped in 238 family members of 42 FCHL 
families, including the 31 families of the original linkage study 4 , and 10 most 

25 promising SNPs in the extended sample of 721 family members from 60 FCHL 
families (see below). The results of the 60 SNPs are shown in the Supplementary 
Table 1. 
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EXAMPLE 2: USF1 GENE AS A CANDIDATE GENE 



10 



We identified a total of 23 SNPs for the 5687 bp sequence of the USF1 gene 
(Supplementary Table 2): Three of these were silent variants in exons, and the rest 
were located in the non-coding regions and in the putative promoter. Eight of the 23 
SNPs were novel. Initially,, we genotyped three SNPs for the USF1 gene: usflsl 
(exon 1 1), usf1s2 (intron 7), and usf1s7 (exon 2) (the corresponding rs numbers for 
the genotyped SNPs are given in Tables 2-3). 



k15 
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Table 1 . Multipoint HHRR and gamete competition analyses for the SNPs usf1s1 

(=RS3737787) AND USF1S2 (=RS2073658). 

All values represent p-values for simultaneous analysis of both SNPs. Ns 
indicates non-significant. The first presented p-values were obtained in 60 
extended FCHL families and the values given in parentheses in 42 nuclear 
FCHL families. Gene dropping was performed only in the 60 extended 
FCHL families using at least 50,000 simulations. The segregating haplotype 
was 1-1 (1 indicates the common allele) in all gamete competition analyses 
above; 





FCHL all 


TG all 


FCHL men 


TG men 


Multi-HHRR 


ns (ns) 


0.05 (ns) 


0.009 (ns) 


0.00003 (0.003) 


Gamete 
competition 


0.00002 
(0.005) 


0.00006 (0.008) 


0.0004 
(0.04) 


0.0000009 
(0.004) 


asymptotic p- 
value 










Gamete 
competition 


0.00004 


0.00006 


0.0004 


0.00001 


(Gene dropping) 










empirical p-value 
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Supplementary Table 2. Association and linkage analyses of TXNIP with FCHL. 

LOD indicates the maximum lod score of the parametric two-point or 
multipoint linkage analysis using the MLINK program and a dominant 
mode of inheritance (recombination fraction is given in parentheses); 
ASP indicates the lod score obtained in the affected sib-pair analysis; 
GAMETE indicates the p-values obtained in the Gamete competition 
analysis; HHRR and multi-HHRR the p-values obtained in the 
haplotype-based haplotype relative risk analysis; and HBAT the p- 
value for the test between the TXNIP haplotypes and the FCHL trait. 
Ns indicates non-significant. For the TG trait, the corresponding p- 
values for all association analyses remained non-significant, and both 
two- and multipoint lod scores were < 1 .5. The numbering of the new 
SNP2 is based on the genomic sequence of the TXNIP region at the 
UCSC Genome Browser, July 2003. All of these SNPs were 
genotyped in the extended sample of 721 family members from 60 
FCHL families. 







Analysis of single SNPs 




Analysis of 

combined 

SNPs 


Method 


SNP1 


SNP2 SNP3 


SNP4 


SNP1-2-3-4 




rs223656 


-1273 bp C- rs9245 


rs721 1 






7 


>T 






Linkage 










LOD 


0.4 (0.14) 


0.3 (0.12) 0.3 

(0.20) 


0.6 
(0.10) 


1.9(0.11) 


ASP 


0.3 


0.3 0.6 


0.2 




Family-based 










Association 










GAMETE 


ns 


ns ns 


ns 


ns 
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HHRR ns ns 
HBAT 

Heterozygosi 0.11 0.10 

ty 



The usflsl and usf1s2 provided evidence for linkage in the 42 FCHL families with 
maximum lod scores of 3.5 and 2.0 for FCHL, and 3.7 and 2.0 for TGs. Combined 
analysis of these SNPs also provided some evidence for association with the 
gamete competition test for both FCHL (p=0.005) and TGs (p=0.008) (Table 1), 
although the results of individual SNPs were non-significant. We also observed a 
difference in the allele frequencies between unaffected and affected men, especially 
with the TG trait. The frequency of minor allele of usflsl was 22.0% in TG-affected 
males and 40% in the unaffected male family members. Since these affected and 
unaffected family members represent non-independent groups of males, we tested 
usflsl and usf1s2 in TG-affected men using the family-based association method, 
HHRR, and the gamete competition test: p-values of 0.01 and 0.02 were obtained in 
the HHRR analysis and 0.008 and 0.02 in the gamete competition test of the 42 
nuclear FCHL families (Table 2). The combined analysis of these SNPs yielded a p- 
value of 0.003 in the HHRR test and 0.004 in the gamete competition test for TGs in 
men (Table 1). 

table 2. Association analyses of individual SNPs for the JAM1-USF1 region for TGs and 

FCHL in men. 

All results represent p-values, ns indicates non-significant, HHRR 
haplotype-based haplotype relative risk test, and Gamete gamete 
competition test. LD cluster number in the last column indicates the clusters 
of SNPs showing strong intermarker LD (p < 0.00002) in the male probands 
with high TGs (>90 th age-sex percentile), i.e. the SNPs carrying the same 
cluster number are in strong pairwise LD. SNPs indicated in bold were 
genotyped in the 60 extended FCHL families, and the values in parentheses 



ns 



ns 



ns 



ns 



0.11 0.12 
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were obtained for these SNPs in the 42 nuclear FCHL families. All other 
results were obtained in the 42 nuclear FCHL families. 



SNP rs number Distanc Heterozygosit TGs TGs FCHL FCHL LD 

e (in y/Rare allele cluster 
bp) frequency in HHRR Garnet HHR Garnet (|-V) 
all family © Re 
members 



jamls 
1 


rs836 


1361 


0.41/0.28 


0.03 


0.009 


ns 


0.03 


1 


jamls 
2 


rs790056 


1561 


0.36/0.24 


ns 


0.03 


ns 


ns 


II 


jamls 
3 


rs790055 


25608 


0.35/0.23 


ns 


ns 


ns 


ns 


II 


jamls 
4 


new 


10572 


0.38/0.26 


0.06 


0.04 


ns 


ns 


1 


jamls 
5 


rs4339888 


124b 


U.40/U.0J 




u.uuo 


1 lo 


0 09 


I 


jamls 

6 


rs3766383 


951 


0.25/0.15 


ns 


ns 


ns 


ns 


in 


usflsl 


rs3737787 


1239 


0.45/0.34 


0.000 
9 

(0.01) 


0.0000 
1 

(0.008) 


0.04 


0.05 


i 


usf1s2 


rs2073658 


12 


0.44/0.33 


0.002 
(0.02) 


0.0000 
6 (0.02) 


0.04 
(ns) 


ns 
(ns) 


i 


usf 1 s3 


rs2516841 


17 


0.40/0.28 


ns 


ns 


ns 


ns 


ii 


usfl s4 


rs2073657 


526 


0.48/0.41 


ns 


ns 


ns 


ns 


IV 


usf 1 s5 


rs25 16840 


1443 


0.41/0.29 


ns 


ns 


ns 


ns 


11 


usf1s6 

USF1 

S7 


rs2073653 
rs25 16839 


361 
1249 


0.25/0.14 
0.47/0.39 


ns 
ns 
(ns) 


0.08 
0.04 
(ns) 


ns 
ns 
(ns) 


ns 
ns 
(ns) 


III 

IV 


usf1s8 


rs2516838 


279 


0.40/0.28 


0.01 
(0.05) 


0.05 
(0.03) 


ns 
(ns) 


ns 
(ns) 


V 


usf1s9 


rs 1556259 




0.23/0.13 


ns 


ns 


ns 


ns 


III 
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Supplementary Table 3. Variants identified by sequencing the USF1 gene in the 31 FCHL 

PROBANDS OF THE ORIGINAL LINKAGE STUDY 3 . 



Location 




rs number 


Rare allele 
frequencies 

(in 31 samples) 


Information on LD 
(in 31 samples) 


Specifics 


-2167 




New 


0.02 






T/C 


-2022 




New 


0.05 






A/C 


-802 




New 


0.03 






C/G 


Exon 1 
INTRON 1 




rs25 16837 
rs1 556259 


0.44 
0.19 


In full LD 
rs25 16839 
rs2774273 


with 
and 


Not 

translated 


= usf 1 s9 














INTRON 1 




rs25 16838 


0.29 








= usf 1 s8 














Intron 1 




rs1 556260 


0.16 


In full LD with SNPs 
in 1125 bp and 1416 
bp; 30/31 samples in 
LD with rs1 556259 




Intron 1 




rs2774273 


0.44 


In full LD 
rs25 16839 
rs25 16837 


with 
and 




Intron 1 / 
bp 


1125 


New 


0.16 


In full LD with 
1416 bp; 


SNP 


C/T j 










30/31 samples in LD 
with rs 1556259 




Intron 1 / 

bp 


1416 


New 


0.16 


In full LD with 
SNP in 1125 bp; 


the 


A/G 










30/31 samples in LD 
with rs 1556259 




tALJlN Z 




rs^o1ooo9 


f\ A A 

0.44 






Not 


= u^f 1 <>7 
INTRON 2 




rs2073653 


0.11 






translated 
region 


= usf1s6 














Intron 3 




rs2073655 


0.23 


In full LD 
rs2073658 


with 




Intron 5 




rs2774276 


0.27 


29/31 in LD 
rs25 16840 


with 




Intron 6 




rs2073656 


0.23 


In full LD 


with 
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INTRON 6 rs2516840 0.32 

= usf1s5 

Intron 6 / 3411 New 0.05 
bp 

Intron 6 / 3519 New 0.05 
bp 

INTRON 7 rs2073657 0.47 

= usf 1 s4 

INTRON 7 rs2516841 0.31 

= usf1s3 

INTRON 7 rs2073658 0.23 

= usf1s2 

Intron 9 / 4445 New 0.03 
bp 

EXON 1 1 rs3737787 0.24 

= usflsl 

Underlined variants were genotyped in the FCHL families. For these SNPs, the 
numbers usf1s1-s9, used in the text and Tables 1-3, are also shown; New indicates 
that the SNP was not found in the SNP databases. The numbering of the new SNPs 
is based on the genomic sequence of USF1 at the UCSC Genome Browser, July 
2003 (refGene_NM_007122). 

Next, we genotyped these two associated SNPs, usflsl and usf1s2, in the larger 
study sample of 60 extended FCHL families. Furthermore, 12 additional SNPs were 
genotyped for the USF1 region (Table 2, Fig. 1). Of the 23 SNPs identified by 
sequencing, we genotyped all the SNPs that were not in strong LD in 31 probands, 
excluding six rare SNPs present in three or fewer individuals (Supplementary Table 
2). A total of four USF1 SNPs were genotyped in the 60 extended families due to 
their promising results in the nuclear study sample and/or LD pattern (Table 2). 
When genotyped in the 60 extended FCHL families, the two individual SNPs, usflsl 
and usf1s2, yielded p-values of 0.0009 and 0.002 in the HHRR test as well as 
0.00001 and 0.0006 in the gamete competition test for TGs in men (Table 2). The 
common allele of both SNPs was more frequently transmitted to the affected 
individuals in both tests and with both the FCHL and TG traits. The asymptotic p- 
values of the combined analyses of these two SNPs were 0.00003 in the HHRR and 



rs2073658 

C/T 
C/T 

In AluSx 
In AluSx 

A/G 
Not 

translated 
region 
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0.0000009 in the combined gamete competition test for TGs in men (Table 1). The 
segregating haplotype was 1-1 (1 indicating the common allele). For all TG-affected 
family members, the combined analysis also produced evidence of association with 
p-values of 0.05 in the HHRR analysis and 0.00006 in the gamete competition test, 
again with the segregating haplotype of 1-1 (Table 1). 

To confirm that the gamete competition results are indeed significant and not biased 
by such contributors as sparse data, we calculated empirical p-values for all gamete 
compete analyses involving multiple SNPs (Table 1) using gene dropping with at 
least 50,000 simulations (see Methods). The obtained empirical p-values were in 
very good agreement with the asymptotic p-values of the gamete competition 
analyses (Table 1), indicating that the observed results do not represent artifacts of 
asymptotic approximations with sparse data. 

After genotyping a total of 15 SNPs in the USF1 region, we identified a pattern of 
association and LD reaching at least 46 kb in men with high TGs and extending 
from the centromeric junctional adhesion molecule 1 (JAM1) gene to the USF1 gene 
(Fig. 1 and Table 2): in addition to usflsl and usf1s2, three other SNPs, jamlsl, 
jam1s4, and jam1s5, also showed evidence for association in the 42 nuclear FCHL 
families for high TGs in men (Table 2). These three SNPs were in strong LD with the 
usflsl and usf1s2 (p < 0.00002). The LD pattern, tested by the Genepop program, 
for SNPs in the JAM1-USF1 region is shown in Table 2. In addition to these five 
SNPs, one SNP (usf1s8) in intron 1 of USF1, showed some evidence for 
association as well (Table 2). This SNP was not in LD with any of the 14 other SNPs 
(Table 2). 

In all affected family members, using both FCHL and TG traits, the evidence for 
association was restricted to the usflsl and usf1s2 (Table 1) within the USF1 gene. 
The rest of the 13 SNPs genotyped for the JAM1-USF1 region did not provide 
significant evidence for association. However, we observed that two additional 
USF1 SNPs among those 23 SNPs identified by sequencing, rs2073655 in intron 3 
and rs2073656 in intron 6, were also in full LD with the associated usf1s2 in 31 
FCHL probands and are likely to extend the FCHL-associated region to intron 3 of 
USF1. No association was obtained with SNPs residing outside the JAM1-USF1 
region (Supplementary Table 1). In conclusion, evidence for association and LD was 
restricted to a 1239 bp region within the USF1 gene in all affected individuals of 
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FCHL families but extended at least 46 kb within the JAM1-USF1 region in men with 
high TGs (Tables 2-3, Fig. 1). 

The combination of the usf1s1-usf1s2 SNPs, resulting in the significant haplotypes 
for FCHL and TGs, was also tested with three additional qualitative lipid traits: high 
apolipoprotein B (apoB), high TC and small low-density lipoprotein (LDL) peak 
particle size. For apoB, p-values of 0.00003 and 0.0007 were obtained for all 
affected individuals and for affected men for the susceptibility haplotype 1 -1 in the 
gamete competition analysis. For TC, the p-values were 0.0001 and 0.007; and for 
LDL peak particle size, 0.002 and 0.01, respectively. These results together with the 
results obtained for FCHL suggest that the underlying gene is not affecting TGs 
alone but also the complex FCHL phenotype. 

EXAMPLE 3: HAPLOTYPE ANALYSES OF THE JAM1-USF1 GENE REGION 

Using the HBAT program we obtained evidence for shared haplotypes in the region 
of usflsl and usf1s2 (Table 3). This observation was supported by multipoint HHRR 
analyses (Table 3). For the haplotype 1-1 (1 indicating the common allele) a p-value 
of 0.0007 was obtained using the -o option. 

Table 3. Haplotype analyses in TG-affected men using the HBAT program (the multilocus 

GENO-PDT AND MULT1-HHRR RESULTS ARE GIVEN BELOW FOR COMPARISON). 

The inter-SNP distances and corresponding rs numbers for the SNPs 
jam1s4-s6 and usf1s1-s5 are shown in Table 2; 1 indicates the common 
allele; and ns non-significant. The p-value of the HBAT program indicates 
the probability that the particular haplotype is transmitted to the affected 
individuals using the option -o (optimize offset) or option -e (empirical test). 
Multilocus geno-PDT indicates a genotype-based association test for 
general pedigrees. The multi-HHRR analysis is testing the hypothesis of 
homogeneity of marker allele distributions between transmitted and non- 
transmitted alleles of the SNPs. 



r 
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Test Haplotype of SNPs: Haplotype of SNPs: 
Jam1s4-6 - usf1s1-2 usf1s1-2 



Haplotype of SNPs: 
usf1s1-5 



HRAT 


p = n rn 

r — U.UO 


P = 0.0007 




k - ns (0.07) 


-o 


(haplotype 
1) 


1-1-1-1- (haplotype 1-1) 




(haplotype 1-1-1-1-1) 






P = 0.004 for the 
protective haplotype 2-2, 
significantly less 
transmitted to the 
affected subjects 




HBAT 


P = 0.009 


P = 0.02 




P = ns (0.2) 


-6 


^napioiype 

*> 


1-1-1-1- (haplotype 1-1) 




(haplotype 1-1-1-1-1) 


Multi- 
locus 


P = 0.02 


P = 0.002 




P = ns (0.7) 


geno- 
PDT 










Multi- 
HHRR 


P = 0.0002 


P = 0.00003 




P = 0.04 
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This option measures not only preferential transmission of the susceptibility 
haplotype to affecteds but also less preferential transmissions to unaffecteds, 
making it useful here since in these extended families the unaffecteds also contain 
important information. The results of the HBAT -e option, a test of association given 
linkage, are also shown in Table 3. Since this test statistics implicitly conditions on 
linkage information, it is less powerful and leads to reduced p-values. However, this 
test together with the results of the HHRR analyses allow us to conclude that the 1-1 
haplotype is associated with the phenotype (Table 3). Furthermore, haplotype 2-2 
was significantly less transmitted to the affected subjects (p=0.004), suggesting a 
protective role for this allele. These results were further supported by a genotype- 
based association test for general pedigrees, the genotype-PDT, which provided 
evidence for association (Table 3), as well as by the gamete competition analyses 
(Table 1), where the same haplotype 1-1 was segregating to the affected individuals 
with both FCHL and TG traits. 
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EXAMPLE 4: EXPRESSION PROFILES OF FAT BIOPSIES AND INITIAL 
FUNCTIONAL ANALYSIS 

We investigated whether the gene expression profiles of fat biopsies from six 
affected FCHL family members carrying the susceptibility haplotype 1-1, constructed 
by the SNPs usflsl and usf1s2, revealed differences when compared to four 
affected FCHL family members homozygous for the putative protective haplotype, 2- 
2 (see above), using the Affymetrix, HGU133A probe array. We also specifically 
investigated whether USF1 is expressed in fat tissue because it is not sufficiently 
represented on the Affymetrix HGU133A chip. Using RT-PCR the USF1 was found 
to be expressed in the fat biopsy samples (data not shown). Quantitative real-time 
PCR was also performed to determine the relative expression levels of USF 1 in 
adipose tissue in the affected FCHL family members carrying the risk haplotype and 
affected members not carrying the risk haplotype. No detectable differences in 
USF1 expression levels could be observed, suggesting that the potential functional 
significance of the FCHL associated allele of the USF1 is not delivered via a direct 
effect on the steady state transcript level in adipose tissue. 

Due to the limited number of samples available, statistical power to detect 
differences in gene expression between the haplotype groups was not considered 
sufficient. As an alternative, we therefore defined cut-off thresholds (see Methods) 
to discriminate between significant differences and differences attributable to 
technical or biological noise in the experimental procedures. Using these criteria, we 
identified 25 genes that appeared up-regulated and 73 genes down-regulated in the 
susceptibility haplotype carriers (the complete lists will be available at our website, 
while the raw data can be accessed through the Gene Expression Omnibus at NCBI 
using the GEO accession GSE590). To lend biological relevance to these findings, 
lists of differentially expressed genes were examined for over-representation of 
functional classes, as defined by the gene ontology (GO) consortium, using the 
Expression Analysis Systematic Explorer (EASE) tool. Only three classes were 
found to be statistically significantly over-represented among the up-regulated 
genes (Fig. 2), primarily implicating genes involved in fat metabolism. Among the 
down-regulated genes, a prominent down-regulation of immune-response genes 
was observed (Fig. 2). The complete results from the EASE analysis, including the 
corresponding EASE scores (p-values) and lists of genes in the significant (=p- 
value<0.05) functional categories, are given in the Supplementary Table 3a-b. 
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Next we investigated the genomic sequence flanking the haplotype 1-1, and 
identified a 60-bp sequence element found in 91 human genes as follows: The SNP 
usf1s2, forming part of the haplotype 1-1, resides adjacent (8 bp) to a 306-bp AluSx 
repeat. Two parts (2-61 bp and 137-196 bp) of this AluSx repeat show sequence 
similarity with the mouse B1 repeat (Fig. 3a). When blasted against the mouse 
sequence databases, these two parts of the AluSx sequence identify numerous 
mouse ESTs, due to the B1 element located in the untranslated region of the mouse 
mRNA. When blasted against human sequence databases, 91 human genes, 
including USF1, have this 60-bp part of AluSx either on the coding strand (43 
genes) or on the opposite strand (48 genes). The 60-bp part is highly conserved 
from human to worm since it was found in pufferfish and Caenorhabditis elegans but 
not in Drosophila melanogaster or in Saccharomyces cerevisiae. A complete list of 
the 91 human genes as well as their individual p-values and identity percentages 
(between 83-98%) are given in Supplementary Table 4. Analysis of domain 
annotation of the 91 genes indicates enrichment of domains involved in protein 
modification (n=16) and domains related to nucleic acids (n=35). This observation 
was also supported by the available annotations about biological process, where 
majority of the genes were involved in nucleic acid metabolism (n=18), as well as in 
transcription and signal transduction (n=33). 

To obtain some evidence for the functional significance of this conserved 60-bp 
DNA element, we produced a 268-bp long construct containing the critical 60-bp 
sequence as well as the usf1s2 SNP region and tested its regulatory function in vitro 
using the SEAP reporter system (Fig. 3b). The genomic DNAs from one 
homozygous susceptibility carrier (haplotype 1-1) and one homozygous non-carrier 
(2-2) were cloned in front of the SEAP reporter gene in two orientations. The effect 
on the transcription of the reporter gene was implicated in the forward orientation in 
both constructs, whereas the reverse orientation resulted in the transcription 
efficiency comparable to the negative control (Fig. 3b). 
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5 The purpose of this experiment was not to solve whether the usf1s2 SNP is directly 
causative to FCHL. More complex functional studies need to be performed before 
any conclusions of the functional significance of a single non-coding SNP can be 
drawn. However, these preliminary data combined with the across species 
conservation would imply that the DNA region flanking the susceptibility haplotype 
10 contains an element affecting transcriptional regulation. The data also suggest that 
the element is more likely to be a Cis acting type regulator rather than a direction- 
independent enhancer element. 



EXAMPLE 5: EXPERIMENTAL SETUP - METHODS 

1 5 The Finnish FCHL families were recruited in the Helsinki, Turku and Kuopio 
University Central Hospitals, as described earlier 4 - 9 . Each subject provided a written 
informed consent prior to participating in the study. All samples were collected in 
accordance with the Helsinki declaration, and the ethics committees of the 
participating centers approved the study design. The inclusion criteria for the FCHL 

0 probands were as follows 4 : 1) serum TC and/or TGs > 90 th age-sex specific Finnish 
population percentiles 4 , but if the proband had only one elevated lipid trait, a first- 
degree relative had to have the combined phenotype; 2) age > 30 years and < 55 
for males and < 65 years for females; 3) at least a 50% stenosis in one or more 
coronary arteries in coronary angiography. Exclusion criteria for the FCHL probands 

5 were type 1 DM, hepatic or renal disease, and hypothyroidism. Familial 
hypercholesterolemia was excluded from each pedigree by determining the LDL- 
receptor status of the proband by the lymphocyte culture method 4 . If the above 
mentioned criteria were fulfilled, families with at least two affected members were 
included in the study, and all the accessible family members were examined. Two 

D traits were analyzed: FCHL and TGs. For the FCHL trait, family members were 
scored as affected according to the same diagnostic criteria as in our original 
linkage study 4 using the Finnish age-sex specific 90 th percentiles for high TC and 
high TGs, available from the web site of the National Public Health Institute, Finland. 
These ascertainment criteria are fully comparable with the original criteria 1 . For 

> analysis of TGs, family members with TG levels > 90 th Finnish age-sex specific 
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population percentile were coded as affected. In addition to the FCHL and TG traits, 
the combination of the usf1s1-usf1s2 SNPs, which resulted in the significant 
haplotypes for the FCHL and TG traits, was also analyzed using the apolipoprotein 
B (apoB), LDL peak particle size and TC traits. For apoB and TC, the 90 th age-sex 
specific Finnish population percentiles, publicly available from the web site of the 
National Public Health Institute, Finland, were used. For LDL peak particle size, the 
cut point of 25.5 nm was used to code individuals with small LDL particles as 
affected. Although LDL-C is an important component trait of FCHL, serum TC was 
used instead in the ascertainment of the Finnish FCHL families as well as in the 
statistical analyses of the SNPs forming the USF1 susceptibility haplotype. The 
reasoning for this is the significant hypertriglyceridemia associated with FCHL. The 
Friedewald formula is generally not recommended when TGs are over (400 mg/dl 
i.e. 4.4 mmol/l), which is often the case with hypertriglyceridemic FCHL family 
members. In addition, the population percentile points of LDL-C could not be 
estimated when including this factor, as we currently don't have population 
percentiles for LDL-C. 

Biochemical analyses 

Serum lipid parameters and LDL peak particle size were measured as described 
earlier 4 ' 9 ' 39 . Probands or hyperlipidemic relatives who used lipid-lowering drugs were 
studied after their treatment was withheld for 4 weeks. In the 60 FCHL families, 
DNA and lipid measurements were available for 721 and 771 family members, 
respectively. In these 60 FCHL families, there were 226 individuals with TC > 90% 
age-sex specific Finnish population percentile, 220 with TGs > 90% age-sex specific 
percentile, 321 with TC and/or TGs > 90% age-sex specific percentile; and 125 
individuals with both TC and TGs >90% age-sex specific percentiles, respectively. A 
total of 96 men and 124 women exhibited high TGs (>age-sex 90 th percentile). 

Sequencing, genotyping and sequence annotations 

The TXNIP gene was sequenced in the 60 FCHL probands and the APOA2, RXRG, 
and USF1 genes in the 31 probands of the original linkage study 4 . For TXNIP and 
USF1, 2000 bp upstream from the 5' end of the gene were also sequenced. For 
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5 USF1, the DNA binding domain was also sequenced in the remaining 29 probands. 
For all genes, both exons and introns were sequenced, except for the large 44,261- 
bp RXRG gene where only exons and 100 bp exon-intron boundaries were 
sequenced. Sequencing was done in both directions to identify heterozygotes 
reliably. Sequencing was performed according to the Big Dye Terminator Cycle 
10 Sequencing protocol (Applied Biosystems), with minor modifications and the 
samples separated with the automated DNA sequencer ABI 377XL (Applied 
Biosystems). Sequence contigs were assembled through use of Sequencher 
software (GeneCodes). The dbSNP and CELERA databases were used to select 
SNPs. Pyrosequencing and solid-phase minisequencing techniques were applied 
15 for SNP genotyping, as described earlier 440 . Pyrosequencing was performed using 
the PSQ96 instrument and the SNP Reagent kit (Pyrosequencing AB). Every SNP 
was first genotyped in a subset of 46 family members from 18 of the 60 FCHL 
families. If the SNP was polymorphic (minor allele frequency > 10% in this subset), 
the SNP was genotyped in 238 family members of 42 FCHL families, including the 
20 31 FCHL families of the original linkage study 4 . This strategy was not applied for 
the TXNIP gene the variants of which all had a minor allele frequency <10%. The 
physical order of the markers and genes was determined using the UCSC Genome 
Browser. The novel SNPs characterized in this study will be submitted to public 
databases (NCBI). All SNPs were tested for possible violation of Hardy Weinberg 
25 equilibrium (HWE) in three groups (all family members, probands, and spouses) 
using the HWSNP program developed by Dr. Markus Perola at the National Public 
Health Institute of Finland. Annotation data of the Alu elements were downloaded 
from the UCSC Genome Browser, which uses the RepeatMasker to screen DNA 
sequences for interspersed repeats. The positions of the 60-bp sequence on these 
> Alu elements were identified using the BLAST. Other annotation data were 
downloaded from the LocusLink. 



Expression array analysis of adipose tissue 

Six affected FCHL family members exhibiting the susceptibility haplotype (see 
Results) and four affected FCHL family members homozygous for the protective 
haplotype were selected for assessment of gene expression. All six susceptibility 
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haplotype carriers were from six individual families. The four homozygous protective 
haplotype carriers were two sibpairs from two families. Biopsies were taken from 
umbilical subcutaneous adipose tissue under local anaesthesia to collect 50-2000 
mg of adipose tissue. The RNA was extracted using STAT RNA-60 reagent (Tel- 
Test, Inc.), according to the manufacturer's instructions, followed by DNAse I 
treatment and additional purification with RNeasy Mini Kit columns (Qiagen). The 
quality of the RNA was assessed using the RNA 6000 Nano assay in the 
Bioanalyzer (Agilent) monitoring for ribosomal S28/S18 RNA ratio and signs of 
degradation. The concentration and the A260/A280 ratio of the samples were 
measured using a spectrophotometer, the acceptable ratio being 1.8-2.2. Then 2 u.g 
of total RNA was reverse transcribed to cDNA using the Superscript Choice System 
(Invitrogen) and T7-oligo(dT) 2 4 primer, according to instructions provided by 
Affymetrix, except using 60 pmols of primer and a reaction volume of 10 after 
which biotin-labeled cRNA was created using Enzo® BioArray™ High Yield™ RNA 
Transcript Labeling Kit (Affymetrix). Prior to hybridization the cRNA was fragmented 
to obtain a transcript size distribution of 50 to 200 bases, after which samples were 
hybridized to Affymetrix Human Genome U133A arrays and scanned in accordance 
with the manufacturers' recommendations. 

Scanned images were analyzed with Affymetrix Microarray Suite 5 (Affymetrix, 
Santa Clara, CA) software employing the Statistical Expression Algorithm. All 
analysis parameters were set to the default values recommended by Affymetrix. 
Global scaling to a target intensity of 100 was applied to all arrays but no further 
normalizations were performed at this point. Output files of result metrics, including 
the scaled signal intensity values and the corresponding detection call expressed as 
absent, marginal or present, were further processed using GeneSpring 5.0 data 
analysis software (Silicon Genetics, Redwood City, CA). For each probe array a per 
gene normalization was applied so that signal intensities were divided by the 
median intensity calculated using all 10 probe arrays. Cut-off values to discriminate 
low quality data were determined separately for each haplotype group by dividing 
the base value with the proportional value estimated using the Cross Gene Error 
Model implemented in GeneSpring. To identify differentially expressed genes 
between the two haplotypes, ratios of averaged normalized intensities were 
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5 calculated. Differences were considered as significant if the resulting ratio fell at 
least three standard deviations outside the average ratio calculated from the 
distribution of the log 10 of the ratios. To further increase result stringency only genes 
scored as present in all 10 samples, or as absent or marginal in all cases and 
present in all the controls (or vice versa), were included. Annotation information 

10 defining the biological processes that each gene could be ascribed to was retrieved 
from the classifications provided by the gene ontology (GO) consortium 41 . Statistical 
evaluation of enrichment of categories represented in each gene list, compared to 
the proportion observed in the total population of genes on the probe array, was 
performed using the Expression Analysis Systematic Explorer (EASE) tool 41 , with 

15 the threshold value set to 3. The test statistic was calculated using Fisher's exact 
test. To maximize robustness, an EASE score (p-value) was calculated where the 
Fisher exact probabilities were adjusted so that categories supported by few genes 
were strongly penalized, while categories supported by many genes were negligibly 
penalized. EASE scores (p-values) falling below 0.05 were considered statistically 
20 significant. 



Quantitative real-time PCR analysis of USF1 

Two affected FCHL family members exhibiting the susceptibility haplotype and two 
affected FCHL family members without the haplotype were selected for assessment 
0 25 of USF1 expression in adipose tissue utilizing the SYBR-Green assay (Applied 
Biosystems). Two step RT-PCR was done using TaqMan Gold RT-PCR kit 
according to manufacturers' recommendations. A total of 1ug of RNA was 
converted to cDNA in a 100 pi reaction of which 1 pi was used in the quantitative 
PCR reaction. The ratio of USF1 to two housekeeping genes GAPDH and HPBGD 

30 was used to normalize the data. The specificity of the reaction was evaluated using 
a dissociation curve in addition to a no-template control. The following PCR primers 
were used in separate 10 pi SYBR-Green reactions: For USF1; forward: 5'- 
ATGACGTGCTTCGACAACAG-3 ', reverse: 5'-GGGCTATCTGCAGTTCTTGG-3'. 
For GAPDH; forward: 5'-CGGAGTCAACGGATTTGGTCGTAT3', reverse: 5'- 

35 AG CCTTCTCCATGGTG GTGAAG AC-3 '. For HPBGD; forward: 5'- 
AACCCTCATGATGCTGTTGTC-3', reverse: 5'-TAGGATGATGGCACTGAACTC3'. 
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The reactions were run in triplicate using the ABI Prism 7900 HT Sequence 
Detection System in accordance with the manufacturers' recommendations and the 
data were analyzed using Sequence Detector version 2.0 software. 

Initial functional analysis 

Initial functional analyses were performed using the SEAP reporter system 
(Clontech Laboratories, Palo Alto, CA) in COS cells. This system utilizes SEAP, a 
secreted form of human placental alkaline phosphatase, as a reporter molecule to 
monitor the activity of potential promoter and enhancer sequences. The constructs 
were cloned into the pSEAP2-Enhancer vector which contains the SV40 enhancer. 
The correct allele and orientation in each construct was verified by sequencing. Cell 
culture media between 48 h and 72 h after transfection were taken for the SEAP 
reporter assay. The monitoring of the SEAP protein was performed using the 
fluorescent substrate 4-methylumbelliferyl phosphate (MUP) in a fluorescent assay 
according to the manufacturer's instructions. Data are representative of at least two 
independent experiments. 

Statistical analyses 

Parametric linkage and nonparametric affected sib-pair (ASP) analyses were carried 
using the same programs and parameters as in the original linkage study 4 . Two 
traits were investigated, the FCHL and TG trait. The MLINK program of the 
LINKAGE package 43 version FASTLINK 4.1 P 44 " 45 was used as implemented by the 
ANALYZE package 46 to perform the parametric two-point and multipoint linkage 
analyses. The ASP analysis was performed using the SIBPAIR program of the 
ANALYZE package 46 . For each marker, allele frequencies were estimated from all 
individuals using the DOWNFREQ program 47 . 

The SNPs were tested for association using the HHRR 27 and the gamete 
competition test 29 . To minimize the number of tests performed, the SNPs residing 
outside the USF1-JAM1 region were tested for association only using the HHRR 27 
test when analyzing the TG- and FCHL-affected males. The HHRR analysis, 
performed by use of the HRRLAMB program 48 , tests the homogeneity of marker 
allele distributions between transmitted and non-transmitted alleles. The multi- 
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HHRR analysis is testing the same hypothesis using several SNPs. The gamete 
competition test is a generalization of the TDT and views transmission of marker 
alleles to affected children as a contest between the alleles, making effective use of 
full pedigree data. The gamete competition method is not purely a test of 
association, because the null hypothesis is no association and no linkage, and thus 
linkage in itself also affects the observed p-value. Furthermore, the gamete 
competition test readily extends to two linked markers, enabling simultaneous 
analysis of multiple SNPs in a gene. P-values based on asymptotic approximations 
can be biased when data used to calculate them are relatively sparse. To confirm 
that the gamete competition results are indeed significant we also calculated 
empirical p-values for all analyses involving multiple SNPs (Table 1) using gene 
dropping. In gene dropping the founder genotypes are assigned using the 
estimated allele frequencies assuming HWE and linkage equilibrium (LE). The 
offspring genotypes are assigned assuming Mendelian segregation. Thus gene 
dropping is performed under the null hypothesis of LE and no linkage. To calculate 
an empirical p-value, gene dropping is performed multiple times. Here at least 
50,000 simulations were performed for each analysis. The likelihood ratio test 
statistic (LRT) from each gene dropping iteration is compared to the LRT for the 
observed data. The empirical p-value is the proportion of iterations in which the 
gene dropping LRT equaled or exceeded the observed LRT. In general, the 
obtained empirical p-values of gene dropping are more conservative than 
asymptotic p-values for small sample sizes. 

The HBAT program, options optimize offset (-o) and empirical test (-e), were 
performed to test for association between haplotypes and the trait 49 . The option -o 
measures not only preferential transmission of the susceptibility haplotype to 
affecteds but also less preferential transmissions to unaffecteds. The -e option leads 
to a test of association given linkage and gives thus an empirical estimation of the 
variance. These haplotype analyses are affected by the fact that four of the 15 
SNPs for the JAM1-USF1 region were genotyped in the 60 extended FCHL families 
and 1 1 SNPs in 42 nuclear FCHL families. The genotype Pedigree Disequilibrium 
Test (geno-PDT) 50 , which provides a genotype-based association test for general 
pedigrees, was also performed for a combination of genotypes from selected USF1 
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SNPs (Table 3). LD between the marker genotypes for SNPs in the JAM1-USF1 
region was tested using the Genepop v3.1b program, option 2, at their web site. In 
this program, one test of association is performed for genotypic LD, and the null 
hypothesis is that genotypes at one locus are independent from the genotypes at 
the other locus. The program creates contingency tables for all pairs of loci in each 
population and performs Fisher exact test for each table using a Markov chain. 



URLS 

Supplementary Tables 1-4 and further details on microarray data will be available at 
our web site (www.genetics.ucla.edu/labs/pajukanta/fchl/chr1/). The raw data for the 
complete set of probe arrays can be accessed through the Gene Expression 
Omnibus at NCBI (www.ncbi.nlm.nih.gov/geo) using the GEO accession GSE590. 
The Finnish 90 th age-sex specific percentile values for TC and TGs are available at 
the web site of the National Public Health Institute of Finland 
(www.ktl.fi.molbio/wwwpub/fchl/genomescan). We used the dbSNP (available at 
www.ncbi.nlm.nih.gov) and CELERA (www.celera.com) for SNP selection; the 
UCSC Genome Browser (genome.ucsc.edu) for physical order of the genes and for 
annotation of the Alu element; the BLAST (www.ncbi.nlm.nih.gov/blastO for blasting 
sequences against human and mouse databases; the LocusLink 
(www.ncbi.nlm.nih.gov/LocusLink/) to download annotation data; and the Genepop 
(wbiomed.curtin.edu.au/genepop/index.html) to calculate intermarker LD. 
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CLAIMS 

A nucleic acid molecule comprising a chromosomal region contributing to or 
indicative of hyperlipidemias and/or dyslipidemias and/or defective 
carbohydrate metabolism, wherein said nucleic acid molecule is selected from 
the group consisting of: 

(a) a nucleic acid molecule having or comprising the nucleic acid 
sequence of SEQ ID NO: 1, wherein said nucleic acid sequence has 
one or more mutations having an effect on USF1 function; 

(b) a nucleic acid molecule having or comprising the nucleic acid 
sequence of SEQ ID NO: 1, wherein said nucleic acid sequence is 
characterized by comprising a guanine or an adenine residue in 
position 3966 in intron 7 of the USF1 sequence; and/or 

(c) a nucleic acid molecule having or comprising the nucleic acid 
sequence of SEQ ID NO: 1, wherein said nucleic acid sequence is 
characterized by comprising a cytosine or a thymine residue in position 
5205 in exon 1 1 of the USF1 sequence; 

wherein said nucleic molecule extends, at a maximum, 50000 nucleotides over 
the 5' and/or 3' end of the nucleic acid molecule of SEQ ID NO: 1 . 

2. The nucleic acid molecule of claim 1 which is genomic DNA. 

3. A fragment of the nucleic acid molecule of claim 1 or 2 having at least 20 
nucleotides wherein said fragment comprises nucleotide position 3966 and/or 
position 5205 of SEQ ID NO:1 . 



4. 



A nucleic acid molecule which is complementary to the nucleic acid molecule 
of any one of claims 1 to 3 and which has a length of at least 20 nucleotides. 



5. A vector comprising the nucleic acid molecule of any one of claim 1 to 4. 
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6. A primer or primer pair, wherein the primer or primer pair hybridizes under 
stringent conditions to the nucleic acid molecule of any one of claims 1 to 4 
comprising nucleotide positions 3966 and 5205 SEQ ID NO:1 or to the 
complementary strand thereof. 

7. A non-human host transformed with the vector of claim 5. 

8. The non-human host of claim 7 which is a bacterium, a yeast cell, an insect 
cell, a fungal cell, a mammalian cell, a plant cell, a transgenic animal or a 
transgenic plant. 

9. A pharmaceutical composition comprising USF1 or a fragment thereof, a 
nucleic acid molecule encoding USF1 or a fragment thereof, or an antibody 
specific for USF1. 

10. A diagnostic composition comprising a nucleic acid molecule encoding USF1 
or a fragment thereof, the nucleic acid molecule of any one of claims 1 to 4,the 
vector of claim 5, the primer or primer pair of claim 6 or an antibody specific for 
USF1. 

11. A method for testing for the presence or predisposition of hyperlipidemia 
and/or dyslipidemia and/or defective carbohydrate metabolism, comprising 
analyzing a sample obtained from a prospective patient or from a person 
suspected of carrying such a predisposition for the presence of a wild-type or 
variant allele of the USF1 gene. 

12. The method of claim 11, wherein said variant comprises an SNP at position 
3966 and/or at position 5205 of the USF1 gene in a homozygous or 
heterozygous state. 

13. The method of claim 11 or 12, wherein said testing comprises hybridizing the 
complementary nucleic acid molecule of claim 4 under stringent conditions to 
nucleic acid molecules comprised in a sample and detecting said hybridization, 
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wherein said complementary nucleic acid molecule comprises the sequence 
position containing the SNP. 

14. The method of any one of claim 11 to 13 further comprising digesting the 
product of said hybridization with a restriction endonuclease or subjecting the 
product of said hybridization to digestion with a restriction endonuclease and 
analyzing the product of said digestion. 

15. The method of claim 14, wherein said probe is detectably labeled. 

16. The method of any one of claims 11 to 15, wherein said testing comprises 
determining the nucleic acid sequence of at least a portion of the nucleic acid 
molecule of any one of claims 1 to 4, wherein said portion comprises the 
position of the SNP. 

17. The method of claim 16, wherein the determination of the nucleic acid 
sequence is effected by solid-phase minisequencing. 

18. The method of claim 17 further comprising, prior to determining said nucleic 
acid sequence, amplification of at least said portion of said nucleic acid 
molecule. 

19. The method of claim 11 to 15, wherein said testing comprises carrying out an 
amplification reaction wherein at least one of the primers employed in said 
amplification reaction is the primer of claim 6 or belongs to the primer pair of 
claim 6, comprising assaying for an amplification product. 

20. The method of claim 19 wherein said amplification is effected by or said 
amplification is the polymerase chain reaction (PCR). 

21. A method for testing for the presence or predisposition of hyperlipidemia 
and/or dyslipidemia and/or defective carbohydrate metabolism comprising 
assaying a sample obtained from a human for the amount of USF1 contained 
in said sample. 
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22. The method of claim 21, wherein said testing is effected by using an antibody 
or aptamer specific for USF1 . 

23. The method of claim 22, wherein said antibody or aptamer is detectably 
labeled. 

24. The method of any one of claims 21 to 23, wherein the test is an 
immunoassay. 

25. The method of any one of claims 11 to 24, wherein said sample is blood, 
serum, plasma, fetal tissue, saliva, urine, mucosal tissue, mucus, vaginal 
tissue, fetal tissue obtained from the vagina, skin, hair, hair follicle or another 
human tissue. 

26. The method of any one of claims 1 1 to 25, wherein the nucleic acid molecule 
or protein from said sample is fixed to a solid support. 

27. The method of claim 26, wherein said solid support is a chip, a silica wafer, a 
bead or a microtiter plate. 

28. Use of the nucleic acid molecule of any one of claims 1 to 5 for the analysis of 
the presence or predisposition of hyperlipidemia and/or dyslipidemia and/or 
defective carbohydrate metabolism. 

29. Use of USF1 or a fragment thereof or of a nucleic acid molecule encoding 
USF1 and/or comprising at least the wild-type sequence of intron 7 and/or 
exon 11 of USF1, for the preparation of a pharmaceutical composition for the 
treatment of hyperlipidemias and/or dyslipidemias including familial combined 
hyperlipidemia (FCHL), hypercholesterolemia, hypertriglyceridemia, 
hypoalphalipoproteinemia, hyperapobetalipoproteinemia (hyperapoB), familial 
dyslipidemic hypertension (FDH), metabolic syndrome, type 2 diabetes 
mellitus, coronary heart disease, atherosclerosis or hypertension. 
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Kit comprising the nucleic acid molecule of any one of claims 1 to 5, the primer 
or primer pair of claim 6 and/or the vector of claim 7 in one or more containers. 
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Abstract 

The present invention relates to a nucleic acid molecule comprising a chromosomal 
region contributing to or indicative of hyperlipidemias and/or dyslipidemias or 
defective carbohydrate metabolism, wherein said nucleic acid molecule is selected 
from the group consisting of: (a) a nucleic acid molecule having or comprising the 
nucleic acid sequence of SEQ ID NO: 1, wherein said nucleic acid sequence has 
one or more mutations having an effect on USF1 function; (b) a nucleic acid 
molecule having or comprising the nucleic acid sequence of SEQ ID NO: 1, wherein 
said nucleic acid sequence is characterized by comprising a guanine or an adenine 
residue in position 3966 in intron 7 of the USF1 sequence; and/or (c) a nucleic acid 
molecule having or comprising the nucleic acid sequence of SEQ ID NO: 1, wherein 
said nucleic acid sequence is characterized by comprising a cytosine or a thymine 
residue in position 5205 in exon 1 1 of the USF1 sequence; wherein said nucleic 
molecule extends, at a maximum, 50000 nucleotides over the 5' and/or 3' end of the 
nucleic acid molecule of SEQ ID NO: 1. The present invention further relates to a 
diagnostic composition comprising a nucleic acid molecule encoding USF1 or a 
fragment thereof, the nucleic acid molecule disclosed herein, the vector, the primer 
or primer pair of the present invention or an antibody specific for USF1 . Finally, the 
present invention relates to the use of the nucleic acid molecule of the invention for 
the preparation of a pharmaceutical composition for the treatment of hyperlipemia, 
dyslipidemia, coronary heart disease, type II diabetes, metabolic syndrome, 
hypertension or atherosclerosis. 
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USF1 



USF1 intron 7, 451 bp 



Associated 



Haplotype 
usflsl -usf1s2 
1239 bp 



♦►usf 1 s2 




AIuSx 306 bp 



B1 



B1 



2-61 bp 

u^f1s3 
usf1s4 



137-196 bp 
no SNPs found 



A total 91 human genes have this 60 bp sequence, similar 

to the mouse B1 repeat, as a part of AIuSx on the coding strand 

(43 genes) or on the opposite strand (48 genes) 



Figure 3a 
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<110> National Public Health Institute 

<120> Identification of SNPs associated with hyperlipidemia , dyslipidemia and 
defective carbohydrate metabolism 



<130> K 1114 EP 
<160> 1 

<170> Patentln version 3.1 

<210> 1 

<211> 5687 

<212> DNA 

<213> Human 

<220> 

<221> variation 
<222> (3966) . - (3966) 

<223> r = adenine (a) or guanine (g) t . 

adenine is wild-type associated; guanine is disease-associated 

<220> 

<221> variation 
<222> (5205) . . (5205) 

<223> y = cytosine (c) or thymine (t) : . . . . - 

thymine is wild-type associated; cytosine is disease-associated 

Jtgaaaattt tccttggata ggaaaggttt ggaggacctt atgggtagag aatttccaaa 60 

aatcttgccc cttttgtgtt gggattatct tattgctttg tactgtgtag ctgtttcttt 120 

ctggaggcat gtctgcccag ctctttgttt ttcctgccct ctggctgggt gtcagggtcc 180 
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taaggcagag cttgtaggtg gattcttccc cctttgtctc ttcttcagaa ccctgttttt 240 

ttttttttta ccccttcttg ctcaggctta gttgatttgg agttgtcata gcaacatttt 300 

agcaacagtg ttgttctgca ggaaggcttg atgaataaaa tagagaatgc ttgaagagga 3 60 

tccacttggg ctttagggtt tctaacagat tatataaatc tggatacccc aaaacaagag 420 

tcctgtcagt agaa-tggggc ccaaatgcca agtctagtct ttgtggtcag ggatattctt 480 

ccagtggtag tgggcttcag atttcctctt cctaggtttg aaaacagaaa tgtcttgatg 540 

gacaacatgt ggctgagaaa ctggaagaag catcagtgtc catgacactg tattttttga 600 

tggtggggcc aatacatggc ccttcctgat tcccatgaag ctgccatcat ggcaggtcat 660 

aatagcttta atgatccatt tagagatgtg ttgttggctg ggtgcggtgg ctcatgcctg 720 

taatccaagc actttgggag gccgaggcag gcggatcacc tgaggtcagg agttccagac 780 

cagcctggcc aatatggtaa aaccccatct ctactgaaaa tacaaaaatt agctgggcgt 840 

ggtggtgggc acctataatc ccagctattc aggaggctga ggcaggagaa tcacttgaac 900 

ccaggagatg gaggttgtaa gccgagattg tgccactgca ctccagcctg ggtgacagag 9 60 

caagattctg tctcagaaaa aaaaaaaaaa aaaagaaaga aatgtgttgt ttcggccagg 1020 

tgcagtggct cacacctgta atcccagcac tttgggaggc tgccgaggtg gacagatcat 1080 

gctctcagga gttcgagacc agccgggcca acatggtgaa accccgtctc tactaaaaat 1140 

acaaaaatta gccaggcgtg gtggtgtgca cctgtaatcc cagctactcc ggaggctgag 1200 

• gcaggagaat cacttgaacc tgggaggcag aggttgcagt gagctgagat cgcgccactg 126 0 

cactccagcc tgggtgacag agagagactc tgtctcaaaa aaaaaaaaaa .aaaaaaagtg 1320 

ttgtttctgt cttccagtat aattatccac tctccaccag gagttggagt gataatggag 1380 

ggatggggaa cactatttgt agccttgctt tttcaatcac tgtaggccag tcctcaacat 1440 

cagtatggtg gaggctgatt gtcccctgca gatgactggg ttattttcct ggctatgtgt 1500 

tcatggaacc taagttctag aaccagagat actgttctgt ttcctaaact cattgcaaac 1560 

ttcatgattt ctaccaggac ttagcactca ggcctgtgaa tcaggagata caaagacctc 1620 

caaaaaagga ccagttcctc ggatgtgccc cctcacagag agatgaaggg gtgagtgaag 1680 

aagaggtagg gtctgggatg aaagatgggt ggcctggaag aatgcaaaat gaccaagagc 1740 

actgcctctg gagtcaggca gacctggatt caggttctac tctatcactt actgtgtgat 1800 

ttggtttctc tatctataaa atggaagtag tgctatctat ctcgtggtgc tgtttttagt 1860 

actaaataag attacatgta atgtacttag cttagtgctt atgtacatag taaacagtaa 1920 

acactagttg ttattctaac ctaacccagc ttctgttggg aatgccaatg agtttgcagc 1980 

catatgttac tgggccagtg agcttctcat tgacttcttc tcatactctt ccttttgtcc 2 040 

tttcaccaca aacaggcagc agaaaacagc tgaaacggaa gaggggacag tgcagattca 2100 
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ggaaggtgag 


tgctagaaac 


agaaccaaga 


ctaagaaccc 


atcatggcct 


cccttccttc 


loU 


cccaccagac 


catctcctgt 


gcatcctcct 


ccttccgtga 


catgcaaatg 


gaacgggggt 


2 22 0 


agaaaggcag 


ttaactcaca 


gacttttcct 


ttgttctttt 


aattcaggtg 


cagtggctac 




tggggaagac 


ccaaccagtg 


tggctattgc 


cagcatccag 


tcagctgcca 


ccttccctga 




ccccaacgtc 


aagtacgtct 


tccgaactga 


gaatgggggc 


caggtaaggg 


agggggccag 


2 400 


gtggctgcag 


gtgttatctg 


gggttgggat 


tgagggaggt 


aattgaacat 


gtcttgggga 


2460 


gacctggctt 


ggaggatgag 


ttgaaagagt 


ggactgttgc 


aggggaggga 


ggtgctaata 


2 520 


ctggagtaga 


gactggtgtg 


aggttagatg 


tatgctgaaa 


cctctgtgtg 


gggaaagaag 


2580 


ggagaatggc 


tgaatccatg 


tctctgaagg 


actttgtttt 


ggggccctat 


ccaagggaag 


2 640 


ctttatgagg 


ggccctagga 


ttcccaacac 


ttaatctttt 


cttctctctt 


cactccctct 


2700 


gccttcctct 


acacttctag 


gtgatgtaca 


gggtgatcca 


ggtgtctgag 


gggcagctgg 


2760 


atggccaaac 


tgagggaact 


ggcgccatca 


gtggctaccc 


tgccactcaa 


tccatgaccc 


2820 


aggtacaggg 


tatgggctgg 


ggaggtcact 


agagttctga 


gaagtaagat 


gaagaaggga 


2880 


atcagtagga 


tgggggtgaa 


gctaggaaca 


gtgaggcatc 


taaggctgcc 


ttgtcccaaa 


2940 


gcactaggct 


ctccttttct 


ggatgtttct 


ctctctctct 


ctctctctct 


ccaccctacc 


3000 


taccacccca 


acggatagaa 


gctgcagagt 


ggtgtagtgg 


gaagaagttt 


ttgactgtta 


3060 


ccagaatcag 


ttttcttgct 


ccccttccca 


ggcggtgatc 


cagggtgctt 


tcaccagtga 


3120 


tgatgcagtt 


gacacggagg 


ggacagctgc 


tgagacgcac 


tatacttact 


tccccagcac 


3180 


ggcagtggga 


gatggggcag 


ggggtaccac 


atcggggagt 


acagctgctg 


ttgttactac 


3240 


ccagggctca 


gaggcactgc 


tggggcaggc 


gacccctcct 


ggcactggtg 


agatattgca 


3300 


tgaggatgct 


ggctgaaagg 


gctagaatag 


gctgtgggac 


atgactggta 


ggcagtgagc 


3360 


cttcactcat 


gactcttagt 


gatcattaag 


acctggacag 


gcagtgagtc 


tggggctgct 


3420 


cttctattag 


catgttcttt 


ttagaggagg 


ggaccagggt 


cttcacctca 


gggcttggtg 


3480 


aggttcctac 


ccatgtcctg 


acagaaccta 


ccctgcatct 


tcacaggtca 


attctttgtg 


3540 


atgatgtcac 


cacaagaagt 


actgcaggga 


ggaagccagc 


gctcaattgc 


ccctaggact 


3600 


cacccttatt 


ccccgtgagt 


gacccttgtt 


tcttctcaga 


ttccgtaagt 


ggtttttttt 


3660 


tttttttttt 


ttttttgaga 


cagagtcttg 


ctctgtcacc 


caggctggag 


tgcagtggca 


3720 


tgatctcagc 


tcactgcaac 


ctctgcttcc 


agggttcaag 


cgtttctcat 


gcctcagcct 


37 80 


cctgagtagc 


tggaactaca 


gacatgtacc 


accacccctg 


gctaattttt 


gtatctttag 


3840 


tagagacagg 


gtttcaccat 


gttggccagg 


ctggtctcga 


actcctgacc 


tcaagtgatc 


3900 


cgcctgcctc 


ggcctcccaa 


agtgctggga 


ttacaggtgt 


gagacaccac 


acctagctac 


3960 


cataartggt 


cctaatacct 


gctaaatctt 


gtataattcc 


ttaaccccaa 


acttcaatca 


4020 
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tgtattttgt 


cttcttactc 


tggccaccct 


gggctctgtt 


gtcaggaagt 


cagaagctcc 


4080 


ccggacgact 


cgggatgaga 


aacgcagggc 


tcagcataat 


gaaggtaggt 


atgatctggg 


4140 


tggagctaga 


agctgtctgg 


tgtgatctca 


gcagtgatgt 


ctgaggggag 


gagggattag 


4200 


gtaattttac 


cctgggactt 


gtggcgagtt 


ttcactgagt 


caccttgtcc 


tccactttgc 


4260 


cccacagtgg 


agcgtcgccg 


ccgagacaag 


atcaacaact 


ggatcgtgca 


gctctccaag 


4320 


ataatcccag 


actgctctat 


ggagagcacc 


aagtctggcc 


aggtcatgga 


aagaccctgg 


4380 


tagtgggcag 


gatgcctgaa 


ttctgcctcc 


tggtattgtt 


tccagaaatg 


gtagagagag 


4440 


gggcacacat 


gacagtagtc 


ttatctctcc 


ctgaggttcc 


tgtatccctg 


ggagatatta 


4500 


taccaccttc 


cttagatgaa 


aatgaggtcc 


aaagtgtgaa 


cctacttttg 


gaaagcaagc 


4560 


tgggtatctg 


aaatcctagt 


tctcattttg 


ttgaccttat 


cttgcagagt 


aaaggtggga 


4620 


ttctatccaa 


agcttgtgat 


tatatccagg 


agcttcggca 


gagtaaccac 


cgcttgtctg 


4680 


aagaactgca 


gggacttgac 


caactgcagc 


tggacaatga 


cgtgcttcga 


caacaggtca 


4740 


gactcctacc 


cccagtgcag 


cccttctcag 


ttctgctagc 


cactgaccca 


gtttgacacc 


4800 


ctctactttg 


ttctccatgg 


agaaggcttc 


atcttttccc 


cctcaccagt 


ggatgtctga 


4860 


atacattcag 


gggcttggaa 


gtgccagctt 


tactacccat 


tccctttact 


gcctccttcc 


4920 


catgtcaggt 


ggaagatctt 


aaaaacaaga 


atctgctgct 


tcgagctcag 


ttgcggcacc 


4980 


acggattaga 


ggtcgtcatc 


aagaatgaca 


gcaactaact 


atggggattc 


aggggctttg 


5040 


ggcccaagaa 


ctgcagatag 


cccaggagca 


acagcctaat 


cccgtgcccc 


tttccttcac 


5100 


tgccccactt 


ctggcatggg 


acagggggaa 


gttcagaagg 


tgtgtccttg 


aactgaggcc 


5160 


ctgtgatatg 


gcggcctgca 


gtggtgtgaa 


acacacaatg 


tggaygtgca 


ctgacagcct 


5220 


tgcccaccc'c 


caccatgcag 


cccctgggcc 


cttgtgctcc 


tctcgcacaa 


tgcatgtgct 


5280 


gtctccatgc 


tggatactgg 


acacactaaa 


ctctggggct 


tgtcctgtgc 


ttgcttagag 


5340 


tgcccagcag 


aggtttgctg 


acaggtgatg 


ctctggcttg 


ccccaggact 


ctggcacttc 


5400 


cattggttct 


tcctttccct 


ggagctgagg 


tttagatgtg 


caacctgtgg 


ctcagggqaa 


5460 


caagcttaca 


caagaagtga 


gggaaggatg 


tttagcagtg 


gctggtgccc 


atgaagagga 


5520 


gattggccag 


tgagaagctg 


aggcctatgc 


agacatctct 


ggagccagag 


agaacaacag 


5580 


gcaggggccc 


acttggggcc 


ttcccccttg 


tgggggtcgt 


tttttttttt 


tcttttcttt 


5640 


tttttttttt 


tttttttttt 


tttttaagat 


aaaattgttc 


aaagcca 




5687 
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